Cantera / cantera

Chemical kinetics, thermodynamics, and transport tool suite
https://cantera.org
Other
624 stars 350 forks source link

Pytest transition #1793

Closed wandadars closed 1 month ago

wandadars commented 2 months ago

Updating the python unit tests to use pytest uniformly in all the tests. Ray had mentioned at one point that the pytest transition was going a bit slow, so I thought I'd give it a shot. I still have about 3 test files left, and they are quite large.

Checklist

bryanwweber commented 2 months ago

Thanks for starting this Chris. If we're considering this change, I think we should also use native pytest fixtures instead of class initializers for test setup.

ischoegl commented 2 months ago

Thanks for starting this Chris. If we're considering this change, I think we should also use native pytest fixtures instead of class initializers for test setup.

pytest fixtures involve a shift in paradigms that would require a complete rewrite of much of our test suite (at least based on my understanding). While there are advantages, I don’t think that this is the best investment of developer time, but YMMV. So I am personally 👍 on this PR as it takes a step in the right direction.

wandadars commented 2 months ago

Thanks for starting this Chris. If we're considering this change, I think we should also use native pytest fixtures instead of class initializers for test setup.

I started down that path, but got mixed up a bit and overwhelmed with the amount of changes, so I reverted back to the standard class approach while I'm updating all of the tests. After that I will try to see if I can make things more idiomatic to pytest.

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 73.23%. Comparing base (3f7d0ff) to head (12ce03f). Report is 56 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1793 +/- ## ========================================== - Coverage 73.24% 73.23% -0.01% ========================================== Files 381 381 Lines 54375 54377 +2 Branches 9253 9066 -187 ========================================== - Hits 39826 39823 -3 - Misses 11578 11582 +4 - Partials 2971 2972 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

wandadars commented 2 months ago

Thanks for starting this Chris. If we're considering this change, I think we should also use native pytest fixtures instead of class initializers for test setup.

pytest fixtures involve a shift in paradigms that would require a complete rewrite of much of our test suite (at least based on my understanding). While there are advantages, I don’t think that this is the best investment of developer time, but YMMV. So I am personally 👍 on this PR as it takes a step in the right direction.

I may regret these words, but so far the fixtures haven't been too bad with regards to replacing the standard base class/derived class logic and the older setup and teardown class methods. I haven't pushed the changed yet until I finish converting one of the larger files (I start with the smallest tests files and incrementally move towards the larger ones, making sure the tests for each file execute the same).

wandadars commented 2 months ago

I don't know if I made anything better with these commits, but all the tests appear to be happy.

Edit, they were happy on my machine. I think something is different with the GitHub CI. Pushing some commits to try and remedy the issue.

wandadars commented 2 months ago

I'm not sure why all the other tests are finding their data, but the ones that use the jacobian-tests.yaml are failing.

Edit: I'm not adding the local path to the data anymore i.e. the cantera/test/data path isn't getting set properly.

wandadars commented 2 months ago

It might be related to test discovery differences, but I do see that on the main branch, running scons test-python reports

2286 passed, 82 skipped, 24 xfail

On this branch we have:

2346 passed, 83 skipped, 24 xfail.

I haven't written any tests, so I think maybe the old test discovery was missing some tests possibly?

wandadars commented 2 months ago

@bryanwweber I'm not sure what is happening with the tests on GitHub. I've tried a few things, but I can't seem to figure out why the tests fail when they run on GitHub because they can't locate the jacobian-tests.yaml file. Locally the tests run fine. I don't know if that is more in your domain of knowledge. I suspect it might have something to do with me moving the fixtures that defined the class-level values of the test_work_path, etc. from the utilitiesmodule to the conftest.py file. The values of the TEST_DATA_PATH seems to be having issues I believe.

speth commented 2 months ago

It might be related to test discovery differences, but I do see that on the main branch, running scons test-python reports

2286 passed, 82 skipped, 24 xfail

On this branch we have:

2346 passed, 83 skipped, 24 xfail.

I haven't written any tests, so I think maybe the old test discovery was missing some tests possibly?

Finding the sources of these changes is a bit challenging due to some of the changed test names (which I think generally make sense), but it seems that most of the increased test count comes from an issue with the use of the TestReaction class in test_kinetics.py. This class should not be used as a base class for all classes that test specific rate types, since this results in re-running all the tests from TestReaction (for example, test_list_from_file) even though there is no difference in the conditions of the test.

bryanwweber commented 2 months ago

This class should not be used as a base class for all classes that test specific rate types

One of the comments I have in general about switching to pytest is that class inheritance is rarely used in idiomatic pytest suites. Rather, module-level functions are the most preferred, followed by methods where the class is simply used for logical grouping rather than shared logic. Shared logic should almost always be done by fixtures because, well, that's just how pytest works best :smile:

ischoegl commented 2 months ago

This class should not be used as a base class for all classes that test specific rate types

One of the comments I have in general about switching to pytest is that class inheritance is rarely used in idiomatic pytest suites. Rather, module-level functions are the most preferred, followed by methods where the class is simply used for logical grouping rather than shared logic. Shared logic should almost always be done by fixtures because, well, that's just how pytest works best 😄

... I have likewise seen that class inheritance is rarely used in pytest. At the same time, a lot of our test suite makes extensive use of shared logic, which is the change of paradigms I had commented on earlier. I am definitely not rooting for a rewrite of our test suite.

bryanwweber commented 2 months ago

I am definitely not rooting for a rewrite of our test suite.

Likewise, however, I think having a mix of unittest and pytest conventions is probably the worst of both worlds. I do worry that the level of effort for a full rewrite is just too high to accomplish, and that while a halfway measure may be a good start, it would never actually get finished. So I'm torn about insisting on "all or nothing" with pytest idioms.

speth commented 2 months ago

One of the comments I have in general about switching to pytest is that class inheritance is rarely used in idiomatic pytest suites. Rather, module-level functions are the most preferred, followed by methods where the class is simply used for logical grouping rather than shared logic. Shared logic should almost always be done by fixtures because, well, that's just how pytest works best 😄

... I have likewise seen that class inheritance is rarely used in pytest. At the same time, a lot of our test suite makes extensive use of shared logic, which is the change of paradigms I had commented on earlier. I am definitely not rooting for a rewrite of our test suite.

Without trying to pull @wandadars in too many different directions, my thought was that the goal for this PR should be to make it possible for us to fully use pytest's features, without necessarily transforming all of our tests into fully idiomatic pytest style. The main impediment to using some of pytests core features like fixtures was the use of the unittest.TestCase class and it's features. So eliminating the remaining use of the assertXyz, setUp, setUpClass, etc. methods from TestCase in favor of features provided by pytest makes a lot of sense. I think more improvements to make use of other pytest features might be better suited for other PRs that focus on specific parts of the test suite, as I don't think these transformations are going to be quite as mechanical as the ones that were (initially) the focus here.

ischoegl commented 2 months ago

[…] So eliminating the remaining use of the assertXyz, setUp, setUpClass, etc. methods from TestCase in favor of features provided by pytest makes a lot of sense. I think more improvements to make use of other pytest features might be better suited for other PRs that focus on specific parts of the test suite, as I don't think these transformations are going to be quite as mechanical as the ones that were (initially) the focus here.

I tend to agree that this is what makes the most sense for this PR. Going beyond would require a rewrite.

wandadars commented 2 months ago

In the test_purefluid.py, I adjusted the tolerance factor from 50 to 70 in the test_pressure test as there seems to be some difference in the equation used in the assertNear and the pytest approx() function. The test was failing using the pytest.approx as it was comparing something like 36,000 += 30,000 to 68,900, which passed before.

wandadars commented 1 month ago

@speth I converted the test_composite.py file away from the assertArrayNear() and used the approx() in its place. I know @bryanwweber had suggested using the assert_allclose(). I just wanted to double check on what should be done before I go through all the tests.

The following script generates the output for the two different approaches.

import numpy as np
import pytest
import numpy.testing as npt

# Define two arrays for comparison
A = np.array([1.0, 2.0, 3.0])
B = np.array([1.0, 2.0001, 3.0002])

def test_compare_with_pytest_approx():
    # Use pytest.approx() directly on the NumPy arrays
    assert A == pytest.approx(B, rel=1e-6)

def test_compare_with_assert_allclose():
    # Use numpy.testing.assert_allclose to compare the arrays
    npt.assert_allclose(A, B, rtol=1e-6)

approx() ouptut:

E   assert array([1., 2., 3.]) == approx([1.0 ± 1.0e-06, 2.0001 ± 2.0e-06, 3.0002 ± 3.0e-06])
E    +  where approx([1.0 ± 1.0e-06, 2.0001 ± 2.0e-06, 3.0002 ± 3.0e-06]) = <function approx at 0x70b3e1810820>(array([1.    , 2.0001, 3.0002]), rel=1e-06)
E    +    where <function approx at 0x70b3e1810820> = pytest.approx

assert_allclose() output:

E   AssertionError: 
E   Not equal to tolerance rtol=1e-06, atol=0
E   
E   Mismatched elements: 2 / 3 (66.7%)
E   Max absolute difference: 0.0002
E   Max relative difference: 6.66622225e-05
E    x: array([1., 2., 3.])
E    y: array([1.    , 2.0001, 3.0002])
speth commented 1 month ago

Interesting, that's not at all what I get for the approx case. Using pytest 8.0.1, I get:

    def test_compare_with_pytest_approx():
        # Use pytest.approx() directly on the NumPy arrays
>       assert A == pytest.approx(B, rel=1e-6)
E       assert array([1., 2., 3.]) == approx([1.0 ±...02 ± 3.0e-06])
E         
E         comparison failed. Mismatched elements: 2 / 3:
E         Max absolute difference: 0.00019999999999997797
E         Max relative difference: 6.666666666665932e-05
E         Index | Obtained | Expected        
E         (1,)  | 2.0      | 2.0001 ± 2.0e-06
E         (2,)  | 3.0      | 3.0002 ± 3.0e-06

test_approx.py:11: AssertionError

While the output is fairly similar, what I like about the approx output is that it shows just the specific failing elements, which is helpful when the arrays are larger and can't be displayed in full (or where displaying them in full isn't particularly helpful).

What version of pytest are you using?

wandadars commented 1 month ago

I must have an old version on my ubuntu VM, because on my other machine using pytest version 8.2.2, I see the output you've shown in your example. So the consensus is that approx() should be what replaces assertArrayNear() then?

bryanwweber commented 1 month ago

So the consensus is that approx() should be what replaces assertArrayNear() then?

That seems good to me!

wandadars commented 1 month ago

I think I've got most of the suggestions implemented. I'm still perplexed about the failing tests on Github though.

bryanwweber commented 1 month ago

The tests are failing because the test/data path is not on the search path for input files. I'm guessing it's something to do with your path changes for those tests.

wandadars commented 1 month ago

The CI issue should be fixed now, which has exposed some issues with the tests when they are run on the Mac OS.

speth commented 1 month ago

The CI issue should be fixed now, which has exposed some issues with the tests when they are run on the Mac OS.

I think it would be safe to just relax the relative tolerance of TestFalloff by a factor of 2 or so.

wandadars commented 1 month ago

The CI issue should be fixed now, which has exposed some issues with the tests when they are run on the Mac OS.

I think it would be safe to just relax the relative tolerance of TestFalloff by a factor of 2 or so.

Alrighty. I did update some of the smaller test files to try to use a structure that is more idiomatic to pytest (versus the class attribute side effect approach that I originally used when transitioning away from the unittest style).

wandadars commented 1 month ago

@speth I believe I have addressed all the suggestions/comments.

bryanwweber commented 1 month ago

Congrats @wandadars! Great work on this