Open sssoleileraaa opened 4 years ago
This could be related to https://github.com/freedomofpress/securedrop-client/pull/1134 because that's when I started seeing the flakes. Keeping track of how often this starts happening here.
We've confirmed that the problem persists even when we revert #1134
Also the flakes only seem to occur in circleci. You can see how rerunning the workflow on the main
branch resolved it here for #347
: https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client?branch=main.
We're able to reproduce locally now (see STR in this Issue's description)
We'll continue to keep an eye on this. @rmol has encountered these flakes as well, and has committed to at least a brief investigation during the 8/20-9/2 sprint.
No progress on this last sprint but test integrity is important, so @rmol will take a look 9/3-9/17, same timebox.
We'll continue to watch for flaky tests on client PRs during the 9/17-10/1 sprint and @rmol will help investigate as warranted.
new example of the same issue still occurring: https://circleci.com/api/v1.1/project/github/freedomofpress/securedrop-client/4822/output/103/0?file=true&allocation-id=5f99bacd4771a47ad89e8a53-0-build%2F5A1221DF
Given these continued test flakes, we're keeping this on the 10/28-11/12 sprint for continued investigation. Would be good to better understand cause of the flake before the November 4 release, to reduce risk of unintended regression beyond the test flakiness.
The nice thing about these flakes is that the second time I run ci, the tests seem to always pass.
@creviera has committed to poking at John's findings in #1158 during the 11/12-11/25 sprint to see if we can get closer to a root cause here.
I also noticed this today in the CI for many times while reviewing #1222. It happened on various different widgets.
https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/567/workflows/6823f779-39fd-4569-9d3d-cb03783941d3/jobs/5091 https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/567/workflows/6823f779-39fd-4569-9d3d-cb03783941d3/jobs/5092 https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/567/workflows/6823f779-39fd-4569-9d3d-cb03783941d3/jobs/5093 https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/567/workflows/6823f779-39fd-4569-9d3d-cb03783941d3/jobs/5094
I have observed three failures in a row on main
this morning, all different tests:
tests/gui/test_widgets.py::test_ExportDialog__on_preflight_success
: https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/608/workflows/c8010cbe-3f9d-417f-acb8-105d53746f36/jobs/5182tests/gui/test_widgets.py::test_ExportDialog__show_starting_instructions
: https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/608/workflows/ac89b43d-1d3c-4b59-8e58-dcff795ddb27/jobs/5180tests/gui/test_widgets.py::test_PrintDialog__on_preflight_success_enabled_after_preflight_failure
: https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/608/workflows/4b6fc463-45f7-44b8-b44c-2463a4cd2bf1/jobs/5178I had a little bit of time to look at this again and I think we need to look carefully at self.button_animation.frameChanged.connect(self.animate_activestate)
and self.header_animation.frameChanged.connect(self.animate_header)
to understand how the handlers might be called after the export/print dialogs are closed.
As a reminder, we show an animation next to the file name in the header when we are booting up sd-devices
and we show an animation in the continue_button when it's making a request to sd-devices
I checked if removing the animations fixed the problem, and even though it made the test failures a lot less likely to occur (because most of the failures are when we access modal widgets to display our animations in) it did not fix the issue. We can still see flaky tests when they initialize a new export or print dialog (see https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/646/workflows/8237e0d1-9660-4d39-ad96-b2ba38aa2d6b/jobs/5294), which makes it seem more likely to be an issue with test concurrency + ordering since we run our tests in parallel and in random order. Testing methods such as _print_file
will close a dialog that a test running in parallel might have just unknowningly sent a signal to. I think the next step is to confirm my hypothesis. Hopefully this is it and we just need to add a dialog fixture or something that provides dialog initialization and cleanup between tests.
Oh, also, as a side note, https://circleci.com/api/v1.1/project/github/freedomofpress/securedrop-client/5298/output/103/0?file=true&allocation-id=6075fe02363fc6688fd0c788-0-build%2F51B3148A happened twice in the past 30 mins, so marking all our functional tests flaky might not be the best solution in the long run, but one flaky issue at a time...
For the 4/15 sprint, @creviera has committed to resolving one type of test flake (possibly the animation-related ones). We'll keep chipping away at this.
For testing, we can use this script to repro and to see that it is fixed.
Meant to say when it is fixed
Immediately post-merge of #1248, I saw the following failure in CircleCI:
yup! #1248 fixed yet another flaky test. and just a heads up that we also have flaky functional tests. but what you linked to here is the runtime error mentioned in this issue which I'm trying to fix or minimize in https://github.com/freedomofpress/securedrop-client/pull/1243
Seeing this issue again: https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/1139/workflows/d26456f0-7e44-41ef-9d0a-9ad89b47c956/jobs/6370
Looks like another issue between starting/stopping animations and dialogs.
bumping... we have more eyes on the project so others see this issue now too
it's still pretty rare - nothing like it was before - but it's such a headache that i plan to take a peek :eyes: again after the holiday
@conorsch, @rocodes and I encountered this one today a few times.
Hypothesis:
@rocodes was noting that specially in a testing context we create and tear down objects very quickly.
Note: even if this hypothesis was true, I don't think that should be happening!! :grimacing:
I had a little bit of time to look at this again and I think we need to look carefully at
self.button_animation.frameChanged.connect(self.animate_activestate)
andself.header_animation.frameChanged.connect(self.animate_header)
to understand how the handlers might be called after the export/print dialogs are closed.
With the above hypothesis in mind, it would make sense to me that the issue would occur particularly around signals.
A new test failure occurred in https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/1955/workflows/b8a2a85f-6edb-44d9-b07b-bd65d35ddcf5/jobs/8077 and looks related to https://github.com/freedomofpress/securedrop-client/blob/9ed27b4b81c5ddc6dbb8b13c20fd35a8f83bed5e/securedrop_client/db.py#L297, but will have to investigate later:
tests/functional/test_seen.py::test_seen_and_unseen Fatal Python error: Aborted
The test usually passes so this might point us to a race.
Description
I've seen the following tests in the client behave flakily :):
test_PrintDialog__show_insert_usb_message
(see circleci)test_ExportDialog__export_file
(see circleci)test_PrintDialog__print_file
(see circleci)test_ModalDialog_animation_of_activestate
(see circle)Further investigation needs to be done around why the dialog tests appear to be flaky - could have to do with needing make test assertions before signal handlers are initiated, e.g. the close signal handler will close the dialog so making an assertion on the dialog after it has been closed could cause an issue.
STR
See https://github.com/freedomofpress/securedrop-client/issues/1139#issuecomment-824493066