Different runs of an update failed because of container unhealthy

javiertuya commented 4 months ago

This update https://github.com/giis-uniovi/retorch-st-eShopContainers/pull/92 failed twice at different times due to an unhealthy container, third run passed:

Gather all debug data
Tell me after this to create the combined update.
Start fixing it

augustocristian commented 4 months ago

Gathered all the data @javiertuya. You can proceed with the combined update

javiertuya commented 4 months ago

@augustocristian Updating, you can fix it now

javiertuya commented 4 months ago

@augustocristian Combined update succeded, but merge to main failed. Until now, 3 failures out of 5 latest runs, during no load period

giis-qabot commented 3 months ago

@augustocristian This is a reminder about this issue because it has not been updated for 10 days

javiertuya commented 3 months ago

@augustocristian @ClaudiodelaRiva This issue was open by July 6th, but still not solved. Newest updates are failing

giis-qabot commented 3 months ago

@augustocristian This is a reminder about this issue because it has not been updated for 10 days

giis-qabot commented 3 months ago

@augustocristian This is a reminder about this issue because it has not been updated for 10 days

augustocristian commented 2 months ago

The root of the problem is caused by non-readiness of the ordering service. As commented by Javier https://github.com/giis-uniovi/retorch-st-eShopContainers/issues/80#issuecomment-2130117795, I am going to move the migrations-data insertions to the msql container to avoid that the ordering service would be blocked

augustocristian commented 2 months ago

Since the last merge to main also failing I've created a branch with the same changes that main, and it passes without problems. I am going to clear the data in the slave and check If its something corrupted.

augustocristian commented 2 months ago

@javiertuya I've seen that the last merge to main has failed, this time is another cause (looks like concurrently access-copy files to the same location). These failures are complex to simulate in my own computer, so I was thinking (its not the first time that I thought about it) to move the execution of test suites from the agent-slave to a dedicated containerized E2E test executor. I see the following GAINS and PAINS:

GAINS:

I think that isolate the execution should solve all concurrency problems (at least considering sequential execution using selenium)
Also enables us to measure the resource (memory, processor and so on) concisely, isolating the consumption of the agent from the test execution itself.
I hope that to improve the reproducibility of the executions, we can control the image through Dependabot and check when a stable maven-java version increase is useful.
PAINS:
Its another abstraction layer, get the debug information, the JUnit reports could be more complex.
Problems creating the containers during the execution (but I think that they are less probable)

Do you have tried or have any experience doing this? Its recommendable? I've seen some tutorial using nodejs https://medium.com/free-code-camp/how-to-dockerize-your-end-to-end-acceptance-tests-dbb593acb8e0, so I think that exposing the correct interfaces it also would be possible in Java

javiertuya commented 2 months ago

@augustocristian This is a completely different approach, it is the GitHub/GitLab approach. If you try to simulate this using Jenkins (e.g. by running each tjob in a container) you can face other problems like conectivity with shared resources.

If the problem is what happens in the last failed build, it could be solved by setting a different build directory for each TJob

giis-uniovi / retorch-st-eShopContainers

Different runs of an update failed because of container unhealthy #94

GAINS:

PAINS: