Open kkourt opened 10 months ago
More typical timeout example. I think one solution would be to move away from these deployments that are flaky "by nature": they somehow fail to deploy on time even in an environment with enough resources. We have been talking in the past about moving away from those and maybe use https://github.com/GoogleCloudPlatform/microservices-demo, especially since now Tetragon is independent of Cilium for those tests.
Let's make a good first issue to do the migration to the microservices demo. I think it makes a lot of sense.
Let's make a good first issue to do the migration to the microservices demo. I think it makes a lot of sense.
This might be fixed by #2345. Let's keep an eye on Tetragon e2e tests for a couple of weeks, if it's stable then we can close the issue.
UPDATE: It seems the test is still flaky after switching to otel-demo app. It failed in #2417: https://github.com/cilium/tetragon/actions/runs/8966724879/attempts/1
Hi @lambdanis https://github.com/cilium/tetragon/actions/runs/8966724879/job/24623050943#step:6:9683
time="2024-05-06T09:26:21Z" level=info msg="PROCESS_EXEC:894 => FINAL MATCH "
time="2024-05-06T09:26:21Z" level=info msg="DONE!"
--- FAIL: TestLabelsDemoApp (241.38s)
--- FAIL: TestLabelsDemoApp/Run_Workload (118.15s)
--- FAIL: TestLabelsDemoApp/Run_Workload/Run_Workload (118.10s)
labels_test.go:53: failed to install demo app. run with `-args -v=4` for more context from helm: exit status 1
labels_test.go:53: failed to install demo app. run with `-args -v=4` for more context from helm: exit status 1
labels_test.go:53: failed to install demo app. run with `-args -v=4` for more context from helm: exit status 1
labels_test.go:60: failed to install demo app after 3 tries
FAIL
The test seems successful, but the demo has failed to install. Maybe this is another flake test?
Btw, I'm wondering why we have to install and check labels in parallel instead of installing the demo app successfully and then running the label checker test? https://github.com/cilium/tetragon/blob/a3b867cb9e77fd1a305c89e4955c0a993e83d8cf/tests/e2e/tests/labels/labels_test.go#L97-L121
Btw, I'm wondering why we have to install and check labels in parallel instead of installing the demo app successfully and then running the label checker test?
I'm not sure indeed. The only reason can be that it can potentially speed up the tests because technically the checker can finish before all the deployments are ready. If it can make debugging easier, maybe we could consider changing that. Do you have any memories on that @willfindlay?
Hit a
TestLabelsDemoApp
failure (https://github.com/cilium/tetragon/actions/runs/7472506683/job/20334842561?pr=1948) in https://github.com/cilium/tetragon/pull/1948. Seems like a flake.Details: