CorrelAid / correlaid_website

Source code for the CorrelAid website
https://correlaid.org
3 stars 0 forks source link

CD fails "Error: page.waitForURL: Navigation failed because page was closed!" #461

Closed jstet closed 10 months ago

jstet commented 10 months ago

Cant reproduce this locally. The only stuff i found so far:

jstet commented 10 months ago

I deployed the page manually because cd failed

jstet commented 10 months ago

@KonradUdoHannes any ideas?

KonradUdoHannes commented 10 months ago

The main question is whether it works locally. If it does not, we might actually have a bug in the code or in the test.

If it works locally. The issue might be related to flakiness of the test. This usually means that the waiting for components to load in the browser while testing does not work 100% reliably/deterministically and sometimes the tests fail because something did not load in time (for the next test step). To test for flakiness we should simply rerun the CD workflow and if it passes on the rerun flakiness is likely the issue. To deal with it playwright has an automatically configured rerun feature which is currently set to max 3 reruns. This worked so far with our flaky tests, but we can always get unlucky with all the reruns. To make this less likely we can increase the max reruns. This should be fine for us but has the disadvantage that it might also rerun on actual bugs that have no chance of passing the rerun. Therefore the rerun number should not be too high.

KonradUdoHannes commented 10 months ago

I addressed one test (test lc chapter pages) that was particularly flake as part of #464 . Not sure whether it addresses this issue. The flaky test was reproducable locally, but if the problem is a race condition this might have been highly sensitive to current computational load on the machine running the test.

@jstet Do you remember (can look up) whether it was the "test lc chapter pages" test that was causing the trouble. If so we could now confidently continue and see whether the problem appears again. If it was another testcase it might make sense to investigate more right away.

jstet commented 10 months ago

No it were other pages as well. The workflows failed again tonight with a similar error than before: https://github.com/CorrelAid/correlaid_website/actions/runs/5194835030/jobs/9366937234

KonradUdoHannes commented 10 months ago

I see. So basically it failed for a commit for which it already passed successfully when I merged. This seems to suggest that its really a bit problematic with regard to the flakiness. I'll have a look at the situations that caused trouble to see whether we can improve it.

KonradUdoHannes commented 10 months ago

@jstet The current failure is reproducable locally and test failure is justified (i.e. the purpouse of our test) as there seems to be an actual problem that I'll address in #466. We should pick continue with this issue after the bugs are resolved.

KonradUdoHannes commented 10 months ago

466 was resolved and I suggest keeping this issue open for a little bit ( maybe a few days) and closing it if no other immediate issues with the CD pipeline arise.

KonradUdoHannes commented 10 months ago

So far no similar issues where encountered as far as I saw. I'll close this issue.