eclipse-tractusx / tractusx-edc

Apache License 2.0
43 stars 54 forks source link

Flaky postgresql test #1027

Open ndr-brt opened 9 months ago

ndr-brt commented 9 months ago

WHAT

There's a flaky test in the "postgresql" test cluster, it fails from time to time, e.g.: https://github.com/eclipse-tractusx/tractusx-edc/actions/runs/7785340855/job/21227835752

FURTHER NOTES

// anything else you want to outline

_Please be sure to take a look at our contribution guidelines and our PR etiquette._

github-actions[bot] commented 8 months ago

This issue is stale because it has been open for 4 weeks with no activity.

wolf4ood commented 7 months ago

After the refactor, the parallelization of tests done here the tests suite have been stable for the past week. I would closed this and in case some flaky tests emerge again i would re-open this

wolf4ood commented 5 months ago

Seems that it's still valid, reopening for investigation

https://github.com/eclipse-tractusx/tractusx-edc/actions/runs/9268208526

ndr-brt commented 5 months ago

looks like a new runtime with jetty is started when another one is still running, and the ports are defined statically so they are the same for every runtime (also for different tests). One solution could be to generate new ports for every test, another to use the same runtime for all the tests (this will also make them run significantly faster).

wolf4ood commented 5 months ago

Seems strange that a runtime is started when another one still running, we don't run them in parallel afaik

github-actions[bot] commented 5 months ago

This issue is stale because it has been open for 2 weeks with no activity.

github-actions[bot] commented 5 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] commented 5 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale.

ndr-brt commented 4 months ago

Looks like the last time they broke on main was 3 weeks ago, maybe we fixed them unintentionally (maybe with the upstream e2e test runtime refactoring?).

wolf4ood commented 4 months ago

It could be, but i think I saw some failure on dependabot PRs, i would leave this open for now, probably it might need further investigation

wolf4ood commented 4 months ago

Seems that similar failure happens also in upstream, less frequent though

https://github.com/eclipse-edc/Connector/actions/runs/9743379424/job/26886697525?pr=4312

github-actions[bot] commented 4 months ago

This issue is stale because it has been open for 2 weeks with no activity.

ndr-brt commented 4 months ago

A possibility could be that the Participant object (that's instantiated statically) gets a free random port, but that port gets then used by the postgresql container as host port. The probability is quite low to be honest, but it could happen anyway. Will refactor it a little, then let's see if that fixes the issue

wolf4ood commented 4 months ago

We can try but the linked upstream failure uses a global service from actions and not a containerized postgres.

I also saw failure on e2e tests without postgres

wolf4ood commented 4 months ago

For example this one which is not using pg

https://github.com/eclipse-tractusx/tractusx-edc/actions/runs/9958935449/job/27514534911?pr=1427

ndr-brt commented 4 months ago

the upstream error is more specific because it says: A binding for port 32762 already exists that means that another binding with the same port is defined in the same runtime (maybe because of different call to getFreePort returned the same value.

while this says: Address already in use so it means that an external service is using the same port, and it could be either postgres or mockserver (some tests use it).

in any case I think it's something related to the getFreePort, maybe we could add a memory to it to avoid to return the same value twice on the same execution. I'll open an issue upstream

ndr-brt commented 3 months ago

My previous theory has been debunked, tests are still failing for the same issue :shrug: https://github.com/eclipse-tractusx/tractusx-edc/actions/runs/10141302101/job/28038270072

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 4 weeks with no activity.

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale.

ndr-brt commented 2 months ago

Ok, I think that also my last theory was wrong... -_- https://github.com/eclipse-tractusx/tractusx-edc/actions/runs/10990267975/job/30510137981

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open for 4 weeks with no activity.