tests: Add `xfail_windows` decorator for failing tests on Windows

echoix commented 1 month ago

This PR is a big step towards being able to safely refactor tests by having almost all tests (see footnote) being ran and checked on Windows, and any failure making the tests fail. The success in number of files is now of 96.8%.

The approach used here is to make all current Windows test failures being expected failures, thus allowing to re-enable all excluded test files. It also allows for files where only one or more tests were failing, to now be non failing and thus have all the other working tests be required to be passing. If ever a previously failing test with an expected failure will end up passing, it will count as a failure as it is an unexpected success. This is important, as it is what allows us to know when we impacted a test. Skipping a complete file isn’t an appropriate solution, as we won’t know the results of all the other good tests of the file.

Plain unittest can have conditions in the decorator skip or skipIf, but none exists for expected failures. So, a new decorator was created in the utils file, that wraps the logic, and adds a warning with explanations when the test ends up applying the expected failure.

The approach to fixing the tests is to find a usage of the xfail_windows decorator, fix the test, and remove the decorator. If we happen to modify the behavior of something that ends up making a Windows test pass (fails as it is an unexpected success), then we also remove the xfail_windows decorator, and after that this test is not allowed to fail any more.

We will now be way more in control of what affects our tests on Windows.

Side note: I have some great progress on running Pytest on the Windows CI, it now runs, but still 15 failures, but not as much as when things like cannot load grass_gis.8.5 library were happening. So it’s much better.

Footnotes:

Some of the failures, that aren’t failures, but errors, are happening in the setUpClass method, and error out before even being able to run a test and apply a decorator to it. So, it still fails.
I don’t know how to apply an expected failure to a doctest, so they either still fail or end up being excluded in a new Windows-only .gunittest.cfg file.
Tests in shell scripts don’t work exactly like unittest tests, so they can’t be individually marked, so either the entire file is excluded and never runs, or stays failing.
Since the success percent (in number of files) is an integer, and we got 96.8%, I have to set 96% as the minimum. (As the condition is >=) This still allows one file to vary without us noticing.

echoix commented 1 month ago

I’m kind of proud of this one. Now we really start working on something. Like individually integrating changes from #2616, knowing for sure what changes impact what tests (because we would remove the decorator one by one).

echoix commented 1 month ago

Another important advantage: On Windows, it is the only place where v.in.pdal tests are run at all, since they are excluded on macOS and Linux, and it is the only place where they are all passing. After this PR, it will then be enforced to always pass.

wenzeslaus commented 1 month ago

v.in.pdal tests have @unittest.skipIf(shutil.which("v.in.pdal") is None, "Cannot find v.in.pdal"). I expect that there is no v.in.pdal on Windows (--without-pdal).

wenzeslaus commented 1 month ago

I don’t know how to apply an expected failure to a doctest, so they either still fail or end up being excluded in a new Windows-only .gunittest.cfg file.

They already have to be handled in a special way, and it is not clear to me what exactly would be the unit to disable (documentation block?).

Originally, there was an idea to use doctest for testing, now perhaps it more for testing that the examples in the documentation are up to date and for that, run on one platform would be enough. I think we didn't have much discussion about that lately.

Tests in shell scripts don’t work exactly like unittest tests, so they can’t be individually marked, so either the entire file is excluded and never runs, or stays failing.

While at some point there was an idea that one shell script can be more than one test, I think these are the "simple" tests, so one script - one test is what it should be, no need to disable on a finer level.

It is not clear to me how to approach the scripts from cross-platform perspective. My idea for the functionality was to accommodate existing test scripts in Bash or sh and people who want to write tests as shell scripts.

Since the success percent (in number of files) is an integer, and we got 96.8%, I have to set 96% as the minimum.

Why don't you exclude the failing files, then you don't have to set success percent anymore. The success percent setting was just trying to compensate for the missing file exclusion feature. Now that's in place (and even the better per-function expected failures on Windows), so I don't see a reason to use it anymore.

echoix commented 1 month ago

v.in.pdal tests have @unittest.skipIf(shutil.which("v.in.pdal") is None, "Cannot find v.in.pdal"). I expect that there is no v.in.pdal on Windows (--without-pdal).

Oh, now I see.. I was sooo surprised it didn't complain at all..

echoix commented 1 month ago

I don’t know how to apply an expected failure to a doctest, so they either still fail or end up being excluded in a new Windows-only .gunittest.cfg file.

They already have to be handled in a special way, and it is not clear to me what exactly would be the unit to disable (documentation block?).

Originally, there was an idea to use doctest for testing, now perhaps it more for testing that the examples in the documentation are up to date and for that, run on one platform would be enough. I think we didn't have much discussion about that lately.

I didn't know either. That's why I kept it mostly out of scope for here. But at the same time, we were able in the recent past to see some failures that were mostly unique here, as the packages in OSGeo4W are mostly very close to the latest releases, very early on. So all the failures related to the latest Python, the latest numpy having different output, or the GEOS format changing, we all see them first here technically if it runs.

Tests in shell scripts don’t work exactly like unittest tests, so they can’t be individually marked, so either the entire file is excluded and never runs, or stays failing.

While at some point there was an idea that one shell script can be more than one test, I think these are the "simple" tests, so one script - one test is what it should be, no need to disable on a finer level.

It is not clear to me how to approach the scripts from cross-platform perspective. My idea for the functionality was to accommodate existing test scripts in Bash or sh and people who want to write tests as shell scripts.

I also understand that the goal was to accommodate existing tests. On windows, it's harder, as we need to have msys just for that. I did a lot of iterations (not exactly on this), but using the already packaged OSGeo4W grass-dev and running tests from the repo towards the installed package. Apart from saving me almost 15 minutes per iteration, it also separated the possible error causes. In these, I struggled a bit on finding why the * didn't the shell tests weren't picking up everything that was supposed to be set. It turns out the second build osgeo4w shell script, only used in our current CI diverged a bit from OSGeo4W and did a little bit of magic and added msys2 paths everywhere so it was tightly coupled and ended up working. I found a way more limited way of placing only the needed PATHs at one place, and works fine. But all that to say, shell script tests are a little pain cross-platform and should be avoided at some point, and are way slower.

Since the success percent (in number of files) is an integer, and we got 96.8%, I have to set 96% as the minimum.

Why don't you exclude the failing files, then you don't have to set success percent anymore. The success percent setting was just trying to compensate for the missing file exclusion feature. Now that's in place (and even the better per-function expected failures on Windows), so I don't see a reason to use it anymore. Because not having them run at all isn't better, and we see that it's way easier to let them be excluded forever.

Once merged, I'll see if it's reasonable to place a floating point variable instead. It will just make it easier. Once merged, it's easier and safer to do refactorings on the test side. Pytest running unittests are getting closer. Last time I checked, it was mostly the part where test data files were copied and made available to the test cases that was missing. Most of the rest was working fine. So playing with the gunittest reporting won't be needed at that point. So its not worth taking toooooo much time on that.

echoix commented 1 month ago

After responding to all the comments of @wenzeslaus, I think it is safe to assume that no changes will be required for it in this PR, so I'll go forward with it.

OSGeo / grass

tests: Add `xfail_windows` decorator for failing tests on Windows #4362