alexandrevilain / temporal-operator

Temporal Kubernetes Operator
https://temporal-operator.pages.dev/
Apache License 2.0
155 stars 33 forks source link

codify custom search attributes for TemporalNamespace #776

Open TheHiddenLayer opened 2 months ago

TheHiddenLayer commented 2 months ago

Closes #567

This PR codifies custom search attributes, so one can specify:

apiVersion: temporal.io/v1beta1
kind: TemporalNamespace
metadata:
  name: default
  namespace: demo
spec:
  customSearchAttributes:
    name1: value1
    name2: value2
    name3: value2
  // additional fields
TheHiddenLayer commented 2 months ago

TODO(@TheHiddenLayer) add tests?

TheHiddenLayer commented 2 months ago

Hi @alexandrevilain,

  1. I'm experiencing various breakages running make test-e2e-dev.
  2. I needed helm and buildx to properly build the container when running make test-e2e-dev. Thinking of updating the local development docs to include those deps. Also, do you prefer the local development docs stay minimal/brief? I noticed there's a lot of commands in the Makefile that aren't mentioned in the local development docs.
  3. Any way you could give me some guidance on what tests you want to see for custom search attributes? This is actually my first time contributing to a k8s operator, and I'm referencing this blog post for the basics.

P.S. regarding:

  1. I'm experiencing various breakages running make test-e2e-dev.

the error is different each time. Here's the error I happened to see this time:

I0724 23:18:56.709046     135 round_trippers.go:553] GET https://temporal-becb55e-control-plane:6443/healthz?timeout=10s  in 5 milliseconds
I0724 23:18:57.177835     135 round_trippers.go:553] GET https://temporal-becb55e-control-plane:6443/healthz?timeout=10s  in 2 milliseconds
I0724 23:18:57.664057     135 round_trippers.go:553] GET https://temporal-becb55e-control-plane:6443/healthz?timeout=10s  in 2 milliseconds
I0724 23:18:58.167209     135 round_trippers.go:553] GET https://temporal-becb55e-control-plane:6443/healthz?timeout=10s  in 3 milliseconds
I0724 23:19:06.648034     135 round_trippers.go:553] GET https://temporal-becb55e-control-plane:6443/healthz?timeout=10s  in 7993 milliseconds
[api-check] The API server is not healthy after 4m0.02803015s

Unfortunately, an error has occurred:
    context deadline exceeded

This error is likely caused by:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
TheHiddenLayer commented 2 months ago

This is running on an M3 apple silicon with colima

fails again, this time with a different error.

docker save temporal-operator > /tmp/temporal-operator.tar
OPERATOR_IMAGE_PATH=/tmp/temporal-operator.tar go test ./tests/e2e -v -timeout 60m -args "--v=4"
I0724 20:24:34.366456   59239 kind.go:157] Creating kind cluster temporal-fcc1c45
I0724 20:24:34.367030   59239 command.go:37] "Found Provider tooling already installed on the machine" command="kind"
I0724 20:24:34.574632   59239 kind.go:171] Launching:kind create cluster --name temporal-fcc1c45
F0724 20:24:35.841364   59239 env.go:375] Setup failure: failed to create kind cluster: exit status 1 : exit status 1: Creating cluster "temporal-fcc1c45" ...
 • Ensuring node image (kindest/node:v1.30.0) 🖼  ...
 ✓ Ensuring node image (kindest/node:v1.30.0) 🖼
 • Preparing nodes 📦   ...
 ✗ Preparing nodes 📦 
Deleted nodes: ["temporal-fcc1c45-control-plane"]
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"
FAIL    github.com/alexandrevilain/temporal-operator/tests/e2e  2.034s
FAIL
make: *** [test-e2e-dev] Error 1
TheHiddenLayer commented 2 months ago

even while running the make test-e2e-dev on the main branch I get

=== RUN   TestPersistence/cassandra_persistence/Temporal_cluster_ready_after_upgrade#03
panic: test timed out after 1h0m0s

@alexandrevilain are you able to run the whole e3e test suite without any errors or timeouts?

alexandrevilain commented 1 month ago

Sorry @TheHiddenLayer I'm currently having issues with internet connectivity while I'm in holidays. I can't even pull the cert-manager image ...

alexandrevilain commented 1 month ago

Hi @TheHiddenLayer following PR succesfully ended e2e without any retry: https://github.com/alexandrevilain/temporal-operator/pull/777

TheHiddenLayer commented 1 month ago

Hi @TheHiddenLayer following PR succesfully ended e2e without any retry: #777

Nice, I'm glad that it worked. It could be something with my logic then? Pls LMK what you think about this.

TheHiddenLayer commented 1 month ago

@alexandrevilain I haven't forgotten about this PR. I'm still planning to finish it off. I’ve been tied up lately with work. Thanks for your patience!

sonarcloud[bot] commented 1 month ago

Quality Gate Passed Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

alexandrevilain commented 6 days ago

Hi @TheHiddenLayer ! Do you need some help for the last few things to fix ?

TheHiddenLayer commented 6 days ago

hey @alexandrevilain! the feature itself works as expected. I've done manual tests confirming:

BUT I didn't deeply investigate why this doesn't pass some E2E tests, so any assistance on that would be great.

I wanted to filter out and run a few tests that were failing to isolate my debugging, but the option flags I passed into the test command didn't do anything. So, the fact that I had to run the entire E2E pipeline (30-50min on my computer) has made this arduous to debug.

sonarcloud[bot] commented 1 day ago

Quality Gate Passed Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

TheHiddenLayer commented 1 day ago

I'll clean up the comments/dead code/debugging log statements