QUVA-Lab / escnn

Equivariant Steerable CNNs Library for Pytorch https://quva-lab.github.io/escnn/
Other
333 stars 44 forks source link

Migrate unit tests to pytest? #79

Open kalekundert opened 10 months ago

kalekundert commented 10 months ago

While I was working on #78, the test suite caused me a lot of trouble. There were two main problems:

Migrating to pytest wouldn't solve the underlying issue, but it has a few features that I think would help a lot. The biggest is the ability to "mark" tests. There are two marks that I'd want to use extensively. The first is xfail, which indicates that a test is currently broken and expected to fail. The failures are still reported, so you know they're happening, but they don't affect whether or not the whole suite is considered to pass or fail. This means that you can require new contributions to not break any currently working tests, while not having to fix all the broken/flaky tests up front. The second mark is slow, which would indicate that a test takes more than a few seconds to complete. With this information, GitHub could be configured to automatically run all the "not slow" tests for each PR, which would help protect against breakages. The slow tests could either be gradually made fast enough to include in the automated test suite, or just be run manually when there's some reason to.

Pytest also supports running tests in parallel across multiple processes, via the pytest-xdist plugin. This alone wouldn't be enough to make the tests run in a reasonable amount of time, but it would help. Finally, pytest makes it easier to share code between tests (because it doesn't require every test to belong to a class) and has features like parametrization that can cut down on boilerplate. These last two things wouldn't make the tests run any faster, but would make them easier to write.

Is this something you'd be in favor of? I'm not asking for you to do anything yourself, I'm more asking for permission to do something like this the next time I want to make a non-trivial change to the code. (With no guarantees that I'll ever actually get around to it.)


In case it's of any interest, here's the test report I generated for the most recent commit (94380ef):

RUNTIME: 1 day, 10:31:15

TOTAL:   864
PASS:    771
FAIL:     39
KILLED:   54

FAILURES:
  AssertionError: ((24, 24, 10, 10, 10), 9)
      nn.test_convolution.TestConvolution.test_padding_modes_r3conv

  AssertionError: False is not true...
      group.test_combine_subgroups.TestSubgroups.test_o3
      group.test_subgroups.TestSubgroups.test_restrict_so3_o2

  AssertionError: Induction from general representations is not supported yet
      nn.test_pooling.TestPooling.test_induced_norm_pooling

  AssertionError: The error found during equivariance check with element...
      nn.test_r3upsampling.TestUpsampling.test_cyclic_even_nearest
      nn.test_r2upsampling.TestUpsampling.test_cyclic_even_nearest
      nn.test_r3upsampling.TestUpsampling.test_o3_nearest
      nn.test_r2upsampling.TestUpsampling.test_cyclic_odd_nearest
      nn.test_r3upsampling.TestUpsampling.test_dihedral_even_nearest
      nn.test_r3upsampling.TestUpsampling.test_ico_nearest
      nn.test_r3upsampling.TestUpsampling.test_cone_nearest
      nn.test_r2upsampling.TestUpsampling.test_dihedral_odd_bilinear
      nn.test_r3upsampling.TestUpsampling.test_dihedral_nearest
      nn.test_tensorproduct.TestTensor.test_with_spatial_dimns
      nn.test_r3upsampling.TestUpsampling.test_so2_nearest
      nn.test_r3upsampling.TestUpsampling.test_so3_trilinear
      nn.test_r2upsampling.TestUpsampling.test_cyclic_odd_bilinear
      nn.test_r3upsampling.TestUpsampling.test_dihedral_odd_nearest
      nn.test_r3upsampling.TestUpsampling.test_inv_nearest
      nn.test_r3upsampling.TestUpsampling.test_so3_nearest
      nn.test_r3upsampling.TestUpsampling.test_cube_nearest
      nn.test_r2upsampling.TestUpsampling.test_so2_nearest
      nn.test_r2upsampling.TestUpsampling.test_dihedral_even_nearest
      nn.test_r2upsampling.TestUpsampling.test_dihedral_odd_nearest
      nn.test_r2upsampling.TestUpsampling.test_o2_nearest
      nn.test_convolution.TestConvolution.test_ico_sparse

  NotImplementedError
      group.test_subgroups.TestSubgroups.test_restrict_ico_dn
      group.test_subgroups.TestSubgroups.test_restrict_ico_tetra
      group.test_combine_subgroups.TestSubgroups.test_ico
      group.test_subgroups.TestSubgroups.test_restrict_octa_dn
      group.test_subgroups.TestSubgroups.test_restrict_octa_tetra

  RuntimeError: Found no NVIDIA driver on your system...
      nn.test_amp.TestMixedPrecision.test_r2conv_uniform
      nn.test_amp.TestMixedPrecision.test_r3pointconv_non_uniform
      nn.test_amp.TestMixedPrecision.test_r2conv_non_uniform
      nn.test_amp.TestMixedPrecision.test_gnorm
      nn.test_amp.TestMixedPrecision.test_r3conv_non_uniform
      nn.test_amp.TestMixedPrecision.test_fourier
      nn.test_amp.TestMixedPrecision.test_fieldnorm

  RuntimeError: could not construct a memory descriptor using a format tag
      nn.test_export.TestExport.test_R2ConvTransposed

The upsampling equivariance tests are the ones that are flaky. The mixed precision tests probably aren't really failures, I just wasn't running on nodes with GPUs.


To end on a more positive note, I should say that I really appreciate how extensive the test suite is. There are so many academic projects out there with not enough tests, but this is the only one I know of with too many tests! It's a good problem to have, I think.

Gabri95 commented 10 months ago

Hi @kalekundert

Thanks a lot for opening this issue! This is indeed quite annoying and I often end up waiting a long time to rerun all tests before pushing new features.

I am not familiar with pytest but it seems very convenient from what you say, so I support adopting this solution!

Let me comment briefly on the failed tests:

The following failures are suspicious, though. I will take a look, thanks for reporting that!

To end on a more positive note, I should say that I really appreciate how extensive the test suite is. There are so many academic projects out there with not enough tests, but this is the only one I know of with too many tests! It's a good problem to have, I think.

Ahah nice, thanks, I really appreciate this comment! 😄

psteinb commented 10 months ago

I personally would favor a port to pytest too. Let me know, if any help would be needed.

psteinb commented 10 months ago

Another aspect I like with vanilla pytest is, that tests can be marked. This way, I can isolate GPU-only tests from short runtime tests. The latter are usually the default and will be run for every commit (for the sake of rapid feedback). The GPU unit tests I then schedule to run once a day or once a week.

See https://docs.pytest.org/en/latest/how-to/mark.html

kalekundert commented 10 months ago

I don't have any immediate plans to work on this, so feel free to start working on it yourself if you want to. I actually don't think it'd be that big a change. Pytest can run unittest-style tests, so you wouldn't have to rewrite any code. Just add some marks and configure CI to run pytest.

If you do want to work on this, one thing that might be useful is this JSON file I compiled. It's just a dictionary where the keys are test names and the values are how long it takes for that test to run, in seconds. My intention was to use this to quickly mark all the "slow" tests. Some tests will be missing from this dictionary. Those are likely, but not necessarily, the ones that took >4h to complete.