OSGeo / grass

GRASS GIS - free and open-source geospatial processing engine
https://grass.osgeo.org
Other
791 stars 288 forks source link

pytest: Mark tests using space_time_raster_dataset as needs_solo_run #3939

Open echoix opened 4 days ago

echoix commented 4 days ago

The test fixture space_time_raster_dataset used in some pytest tests, including in some of the Jupyter tests, is unexpectedly slow at the setup stage when running in pytest with multiple workers. There might be a deadlock problem, or might just use many processes or threads by itself.

Since it causes some intermittent timeouts, this PR adds the marker to run them separately, without other tests in parallel. The setup stage is still slow, but I highly expect after my couple tries that it will prevent some more timeouts that were still present after https://github.com/OSGeo/grass/pull/3879.

On the positive side, there's (hopefully) less flaky tests to retry, that can cause delays when there is a lot of activity. On the negative side, since these tests are slow AND that the setup phase of that fixture is also slow, the overall time taken for the tests increases compared to the sum of parallel+serial test steps durations. So that means that there was some speedup of running them in parallel that we are temporarily letting go. It is still better than no parallel workers as before. And either way, some of deadlock issues, if they are caused by the multiprocessing using the fork startup method, will need to be fixed in order to run any of these tests on macOS and Windows. It's a work in progress (not this PR, this PR is ready)

echoix commented 4 days ago

Also note that the "coverage" change that you might see is still flawed, as this PR will add new files "seen" by the coverage tool. It doesn't know about all the files in the repo yet, so it now sees 54 more new files.

neteler commented 3 days ago

The test fixture space_time_raster_dataset used in some pytest tests, including in some of the Jupyter tests, is unexpectedly slow at the setup stage when running in pytest with multiple workers. There might be a deadlock problem, or might just use many processes or threads by itself.

Wild guess without checking the code: could it be a SQLite locking issue due to SQLite concurrent (read/) write access happening?

echoix commented 3 days ago

The test fixture space_time_raster_dataset used in some pytest tests, including in some of the Jupyter tests, is unexpectedly slow at the setup stage when running in pytest with multiple workers. There might be a deadlock problem, or might just use many processes or threads by itself.

Wild guess without checking the code: could it be a SQLite locking issue due to SQLite concurrent (read/) write access happening?

I'm not sure, as trying to follow around what is called within the temporal modules is a nightmare, everything gets invoked/imported. Technically, it is supposed to be 7 rasters, and that fixture valid for all the tests. But something isn't quite right.

Once coverage is properly set up (maybe once C-code is also tracked), it will be a good thing to incrementally change the tests and see what changes, to simplify redundant tests (by only using the coverage provided by one test vs another).

In a separate attempt last weekend to profile some tests (gunittest), I started with only a subfolder in temporal, t.rast.algebra, and a lot of time (18%) was spent on a single inner loop of PLY that checked if an element was in a list. Other than that, one big time wasted at importing tgis, as it imported almost everything with star imports.

So the problem can be anywhere.

For now, the goal is only to prevent useless failures to have correct feedback earlier (without waiting to retry a job)

echoix commented 3 days ago

I have other changes queued to these files (dating from the last day of the sprint), so I'm waiting for this to be merged first.

echoix commented 1 day ago

I had to rerun about 4-5 pytest failures on main that worked on a second try. I would've hoped that this would have fixed it