freedomofpress / securedrop-client

a Qt-based GUI for SecureDrop journalists 📰🗞️
GNU Affero General Public License v3.0
41 stars 40 forks source link

Investigate flaky tests #1139

Open sssoleileraaa opened 4 years ago

sssoleileraaa commented 4 years ago

Description

I've seen the following tests in the client behave flakily :):

Further investigation needs to be done around why the dialog tests appear to be flaky - could have to do with needing make test assertions before signal handlers are initiated, e.g. the close signal handler will close the dialog so making an assertion on the dialog after it has been closed could cause an issue.

STR

See https://github.com/freedomofpress/securedrop-client/issues/1139#issuecomment-824493066

sssoleileraaa commented 4 years ago

This could be related to https://github.com/freedomofpress/securedrop-client/pull/1134 because that's when I started seeing the flakes. Keeping track of how often this starts happening here.

Update

We've confirmed that the problem persists even when we revert #1134

sssoleileraaa commented 4 years ago

Also the flakes only seem to occur in circleci. You can see how rerunning the workflow on the main branch resolved it here for #347: https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client?branch=main.

Update

We're able to reproduce locally now (see STR in this Issue's description)

eloquence commented 4 years ago

We'll continue to keep an eye on this. @rmol has encountered these flakes as well, and has committed to at least a brief investigation during the 8/20-9/2 sprint.

eloquence commented 4 years ago

No progress on this last sprint but test integrity is important, so @rmol will take a look 9/3-9/17, same timebox.

eloquence commented 4 years ago

We'll continue to watch for flaky tests on client PRs during the 9/17-10/1 sprint and @rmol will help investigate as warranted.

emkll commented 4 years ago

I observed a failed test run in https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/395/workflows/53167775-c4c5-4d9e-abc0-51a50d1180ab/jobs/4685

sssoleileraaa commented 3 years ago

new example of the same issue still occurring: https://circleci.com/api/v1.1/project/github/freedomofpress/securedrop-client/4822/output/103/0?file=true&allocation-id=5f99bacd4771a47ad89e8a53-0-build%2F5A1221DF

eloquence commented 3 years ago

Given these continued test flakes, we're keeping this on the 10/28-11/12 sprint for continued investigation. Would be good to better understand cause of the flake before the November 4 release, to reduce risk of unintended regression beyond the test flakiness.

sssoleileraaa commented 3 years ago

The nice thing about these flakes is that the second time I run ci, the tests seem to always pass.

eloquence commented 3 years ago

@creviera has committed to poking at John's findings in #1158 during the 11/12-11/25 sprint to see if we can get closer to a root cause here.

kushaldas commented 3 years ago

I also noticed this today in the CI for many times while reviewing #1222. It happened on various different widgets.

https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/567/workflows/6823f779-39fd-4569-9d3d-cb03783941d3/jobs/5091 https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/567/workflows/6823f779-39fd-4569-9d3d-cb03783941d3/jobs/5092 https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/567/workflows/6823f779-39fd-4569-9d3d-cb03783941d3/jobs/5093 https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/567/workflows/6823f779-39fd-4569-9d3d-cb03783941d3/jobs/5094

emkll commented 3 years ago

I have observed three failures in a row on main this morning, all different tests:

sssoleileraaa commented 3 years ago

I had a little bit of time to look at this again and I think we need to look carefully at self.button_animation.frameChanged.connect(self.animate_activestate) and self.header_animation.frameChanged.connect(self.animate_header) to understand how the handlers might be called after the export/print dialogs are closed.

sssoleileraaa commented 3 years ago

As a reminder, we show an animation next to the file name in the header when we are booting up sd-devices and we show an animation in the continue_button when it's making a request to sd-devices

sssoleileraaa commented 3 years ago

I checked if removing the animations fixed the problem, and even though it made the test failures a lot less likely to occur (because most of the failures are when we access modal widgets to display our animations in) it did not fix the issue. We can still see flaky tests when they initialize a new export or print dialog (see https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/646/workflows/8237e0d1-9660-4d39-ad96-b2ba38aa2d6b/jobs/5294), which makes it seem more likely to be an issue with test concurrency + ordering since we run our tests in parallel and in random order. Testing methods such as _print_file will close a dialog that a test running in parallel might have just unknowningly sent a signal to. I think the next step is to confirm my hypothesis. Hopefully this is it and we just need to add a dialog fixture or something that provides dialog initialization and cleanup between tests.

sssoleileraaa commented 3 years ago

Oh, also, as a side note, https://circleci.com/api/v1.1/project/github/freedomofpress/securedrop-client/5298/output/103/0?file=true&allocation-id=6075fe02363fc6688fd0c788-0-build%2F51B3148A happened twice in the past 30 mins, so marking all our functional tests flaky might not be the best solution in the long run, but one flaky issue at a time...

eloquence commented 3 years ago

For the 4/15 sprint, @creviera has committed to resolving one type of test flake (possibly the animation-related ones). We'll keep chipping away at this.

sssoleileraaa commented 3 years ago

For testing, we can use this script to repro and to see that it is fixed.

Update

Meant to say when it is fixed

conorsch commented 3 years ago

Immediately post-merge of #1248, I saw the following failure in CircleCI:

failure output ``` =================================== FAILURES =================================== ______ test_ExportDialog__update_dialog_when_status_is_USB_NOT_CONNECTED _______ mocker = def test_ExportDialog__update_dialog_when_status_is_USB_NOT_CONNECTED(mocker): > dialog = ExportDialog(mocker.MagicMock(), "mock_uuid", "mock_filename") tests/gui/test_widgets.py:3827: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = controller = , file_uuid = 'mock_uuid' file_name = 'mock_filename' def __init__(self, controller: Controller, file_uuid: str, file_name: str): super().__init__() self.controller = controller self.file_uuid = file_uuid self.file_name = SecureQLabel( file_name, wordwrap=False, max_length=self.FILENAME_WIDTH_PX ).text() self.error_status = "" # Hold onto the error status we receive from the Export VM # Connect controller signals to slots self.controller.export.preflight_check_call_success.connect(self._on_preflight_success) self.controller.export.preflight_check_call_failure.connect(self._on_preflight_failure) self.controller.export.export_usb_call_success.connect(self._on_export_success) self.controller.export.export_usb_call_failure.connect(self._on_export_failure) # Connect parent signals to slots > self.continue_button.setEnabled(False) E RuntimeError: wrapped C/C++ object of type QPushButton has been deleted securedrop_client/gui/widgets.py:2730: RuntimeError ```
sssoleileraaa commented 3 years ago

yup! #1248 fixed yet another flaky test. and just a heads up that we also have flaky functional tests. but what you linked to here is the runtime error mentioned in this issue which I'm trying to fix or minimize in https://github.com/freedomofpress/securedrop-client/pull/1243

sssoleileraaa commented 2 years ago

Seeing this issue again: https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/1139/workflows/d26456f0-7e44-41ef-9d0a-9ad89b47c956/jobs/6370

Looks like another issue between starting/stopping animations and dialogs.

Also: https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/1139/workflows/d26456f0-7e44-41ef-9d0a-9ad89b47c956/jobs/6370

sssoleileraaa commented 2 years ago

bumping... we have more eyes on the project so others see this issue now too

sssoleileraaa commented 2 years ago

it's still pretty rare - nothing like it was before - but it's such a headache that i plan to take a peek :eyes: again after the holiday

gonzalo-bulnes commented 2 years ago

@conorsch, @rocodes and I encountered this one today a few times.

Hypothesis:

@rocodes was noting that specially in a testing context we create and tear down objects very quickly.

Note: even if this hypothesis was true, I don't think that should be happening!! :grimacing:

gonzalo-bulnes commented 2 years ago

I had a little bit of time to look at this again and I think we need to look carefully at self.button_animation.frameChanged.connect(self.animate_activestate) and self.header_animation.frameChanged.connect(self.animate_header) to understand how the handlers might be called after the export/print dialogs are closed.

With the above hypothesis in mind, it would make sense to me that the issue would occur particularly around signals.

sssoleileraaa commented 2 years ago

A new test failure occurred in https://app.circleci.com/pipelines/github/freedomofpress/securedrop-client/1955/workflows/b8a2a85f-6edb-44d9-b07b-bd65d35ddcf5/jobs/8077 and looks related to https://github.com/freedomofpress/securedrop-client/blob/9ed27b4b81c5ddc6dbb8b13c20fd35a8f83bed5e/securedrop_client/db.py#L297, but will have to investigate later:

tests/functional/test_seen.py::test_seen_and_unseen Fatal Python error: Aborted

The test usually passes so this might point us to a race.