TuringLang / Turing.jl

Bayesian inference with probabilistic programming.
https://turinglang.org
MIT License
2k stars 216 forks source link

Test restructuring #2237

Closed mhauru closed 3 weeks ago

mhauru commented 1 month ago

This PR makes two orthogonal changes to how tests are run and structured.

  1. All test files included in runtests.jl are made into modules, and imports are moved into those modules. runtests.jl now has minimal imports. This cleans up the namespaces in which the individual tests are run, and makes it clearer where various references in tests point to.
  2. Test files can now be included selectively using keyword arguments. If one runs e.g. Pkg.test("Turing", test_args=["optim", "hmc", "--skip", "ext", "nelder"]) then only test files that have the substring "optim" or "hmc" in their path will be run, unless they also have one of the substrings "ext" or "nelder" in their path, in which case they are skipped. By default all tests are run. Substring comparisons are case-insensitive.

Point 2. makes the @turing_testset and @numerical_testset macros unnecessary, and they will be removed.

The selective running of tests is useful for both local development and splitting our CI to run tests for various parts of the Turing.jl in parallel (#2179). I'll try implementing the latter next to see that it goes smoothly.

This is an early draft, I'm opening it to get comments (@yebai, @torfjelde). I've only converted a couple of files to the submodule format, and the selective running lacks documentation. If this is deemed a good design I'll go and finish the implementation.

mhauru commented 1 month ago

For the record, optimally I would have liked the selective test running to be based on testset names rather than file paths. I tried a couple of designs for that, but ran into various limitations and complexities in how AbstractTestSet and nesting macros work. This could be revisited later.

yebai commented 1 month ago

cc @willtebbutt @devmotion who might have thoughts.

willtebbutt commented 1 month ago

I'm in favour of these kinds of changes -- it's good to clean up the namespaces, and to remove these macros in favour of using standard functionality.

Re parallelisation of tests: I've generally gone about this in a very slightly different way (see e.g. Tapir's tests and the associated action ), but it's not obvious to me that either approach is necessarily better than the other.

mhauru commented 1 month ago

There's a file called test/essential/container.jlthat's not being included in runtests.jl, or anywhere else. Does someone know if it should be deleted or included in the test suite?

yebai commented 1 month ago

It should be included.

mhauru commented 1 month ago

I finished the restructuring described in the opening message. I also removed the code coverage part of CI, because it seems to have been broken for the last > 1 year. If we want to reintroduce it I would propose running it as a separate job outside the current test matrix, since it requires running all the tests in one go and only on a single platform.

More work could be done harmonising the way we do imports and qualified references, but I think that could be a separate PR. In this one I avoided touching anything in the actual test code, and only modifying the "header" of each file.

The longest test CI job now takes around 25 minutes, so more could perhaps be done to make tests faster. (EDIT: I did some further splitting, so now it's more like 20 minutes.) Either making them more lightweight or splitting some of the heavy test files into several files.

devmotion commented 1 month ago

I appreciate the effort, it's good to clean up and improve the test structure a bit 🙂

My feeling is that we should neither implement nor use parsing of test args (and hence do not support something like Pkg.test("Turing", test_args=["optim", "hmc", "--skip", "ext", "nelder"]). In my experience it's much easier to use ENV variables for test GROUPs (similar to what @willtebbutt uses in Tapir it seems) since this does (almost) not require any additional code and is easier to use (you can just update the ENV variable in the julia REPL and (re)run ] test). It's also the approach taken throughout the SciML ecosystem and in e.g. Bijectors (https://github.com/TuringLang/Bijectors.jl/blob/1070b7349f14626808765e32621db82e64969f29/test/runtests.jl#L32-L61), DynamicPPL (https://github.com/TuringLang/DynamicPPL.jl/blob/master/test/runtests.jl) and DistributionsAD.

mhauru commented 1 month ago

Thanks for the comment @devmotion, happy to discuss this. Here are reasons why I went with ARGS rather than env vars, not in any particular order:

  1. I wanted to have the option to include and exclude multiple test files. This requires some parsing of the input, and doing that for command line arguments is more natural and more standard practice than for env vars, which are usually single values.
  2. Pkg.test has features explicitly for this, which I think is a strong hint that this can be considered the standard way of doing test selection in Julia. Pkg.test docs mention exactly this as an intended use case of test_args.
  3. I find command line arguments handier when running things manually: I'm more used to passing things args than setting env vars when I run things from a terminal. For CI I think both work equally well.

It's also the approach taken throughout the SciML ecosystem and in e.g. Bijectors, DynamicPPL, and DistributionsAD.

That's a fair point that I hadn't considered. At the same time, the ARGS approach is what Julia itself does (I took the name of --skip from the tests of stdlib).

it's much easier to use ENV variables for test GROUPs (similar to what @willtebbutt uses in Tapir it seems) since this does (almost) not require any additional code

I would argue that using ARGS does not require any more extra code in runtests.jl. The only reason why there's any substantial extra code here (still <30 lines + docstrings) is because of parsing of include/exclude tests, which I would like to have regardless of whether we use env vars or ARGS.

you can just update the ENV variable in the julia REPL and (re)run ] test

You can have a similar user experience with ARGS, by doing ARGS .= ["my", "new", "--skip", "args"]; ] test.

devmotion commented 1 month ago

I can see these points and I'm sure tests could be structured with ARGS as well - but I think it would be good to follow common practices in the ecosystem and use environment variables.

IMO one of the main problems with the current Turing test setup is that it does not follow common practices in the ecosystem and defines its own macros for structuring the tests. My impression is that this makes it unnecessarily complicated for new contributors to read, modify, and add tests and adds additional maintenance burden on our side. My worry is that with an uncommon practice such as ARGS (I haven't encountered a single Julia package yet that uses it to structure tests) we will still make it more complicated than necessary for new contributors (and the custom parsing requires additional maintenance on our side).

yebai commented 1 month ago

I like the ARGS approach and was not aware of it previously. One issue with standard practices is that they tend to be heavily influenced by early design decisions in the community and then evolve slowly. If we put a one-line testing instruction in README, maybe people can start learning about this ARGS option.

devmotion commented 1 month ago

I'm still not convinced (but also don't care too much if everyone else has a different opinion) - I think it's good to adopt improvements even if they are not common practice yet, but I don't see why/how ARGS is superior than ENV and therefore I would go with the more established and more well-known approach.

mhauru commented 1 month ago

I don't really have anything to add to my earlier comments on command line arguments vs env vars, except that I don't think the args setup would get in the way when adding/modifying existing tests. By default it runs all tests, and there's no manual grouping of tests so you can add new ones even if you're oblivious that there even is a way to run tests selectively. I appreciate that consistency with related packages has value though. I'm happy to say it's Hong's call, but that's probably because he seems to prefer what I prefer. :)

Regardless of the final design, I'll add a note in the contributing docs page on how to run the tests once this has been merged.

In other news, I took all the files in test/test_utils and either made them into modules that are now imported with relative imports, or, in cases where only one test file was using the utils, merged them into the test file. I also deleted a bunch of code from the utils that wasn't being used anywhere.

There's also test/skipped which is about 900 lines of code not included in runtests.jl. Might these tests be brought back at some point, or are they run manually in some particular situations?

yebai commented 4 weeks ago

One suggestion for CI setup is that failures of particular CI tasks should not cancel other CI tasks so we can independently identify multiple CI failures in parallel. This is now more doable than before due to the much shorter runtime of each CI task.

willtebbutt commented 4 weeks ago

@yebai should this not be a patch version bump? It looks non-breaking

yebai commented 4 weeks ago

There are breaking changes on the master branch, which have not been released yet.

mhauru commented 3 weeks ago

One suggestion for CI setup is that failures of particular CI tasks should not cancel other CI tasks so we can independently identify multiple CI failures in parallel.

Good point, should now be fixed.

Assuming CI passes (the previous error was random, see #2234), I have no other changes in mind that I would like to make. I also don't have merge rights, so someone else needs to make the call on when this is final.

torfjelde commented 3 weeks ago

I'd also add my vote to sticking with ENV vs. ARGS, as I've voiced before this PR. My argument is the same as @devmotion ; this is what's done elsewhere within the ecosystem we generally work with. We (the developers) now have to be aware of which package uses ARGS and which uses ENV variables when running tests, which just seems like unnecessary headache without any added benefit.

Also, IMO, when "trying out" stuff like this, it's better to do it in some of the "simpler" packages and then eventually propagate to Turing.jl, rather than first adding it to Turing.jl.

The rest of the PR I think is a very, very good idea @mhauru :) Very keen to have the CI become much less of a pain :tada:

mhauru commented 3 weeks ago

For those who would prefer env vars, how would you like to implement being able to include and exclude multiple test files in a given run? In other words, what would be the env var equivalent of doing Pkg.test("Turing", test_args=["optim", "hmc", "--skip", "ext", "nelder"])?

without any added benefit

I do see some benefits, the most important ones being

  1. Respecting Pkg.test's intended way for selecting tests to run.
  2. Being more inline with the wider software ecosystem, e.g. how testing frameworks in other languages work.

I commented on 1. more extensively above, but to expand on 2., giving command line arguments to include/exclude particular tests is the standard way to do things in for instance pytest and the standard unit testing frameworks of both Go and Rust. I haven't used all that many testing frameworks, but I haven't seen SciML's env var approach elsewhere. I presume this consistency with other languages and frameworks is also why Pkg.test has the test_args mechanism, and why Julia's Stdlib uses it.

Using environment variables for test selection also goes against my idea of what env vars are meant for. I view env vars as global flags that should set some state about the, well, the environment in which the process operates. Kinda like defining a global constant at the top your source code file that affects everything in that package/module. Selectively running tests is more like passing arguments to a function as you call it, where you might for instance call it back to back with different arguments, and that's exactly what command line arguments are for.

torfjelde commented 3 weeks ago

In other words, what would be the env var equivalent of doing Pkg.test("Turing", test_args=["optim", "hmc", "--skip", "ext", "nelder"])?

This is a really good argument :+1: You could of course implement the same parsing of an ENV variable, but I very much agree that it's much, much more natural to do so using ARGS.

Respecting Pkg.test's intended way for selecting tests to run.

I that the the docs mention that this is a use-case for args, but does it specifically recommend that over ENV variables?

I'm am genuinely quite keen on giving this pattern a go, but I'm also somewhat worried about doing this in Turing.jl :confused:

Has there been a similar discussion elsewhere? I.e. what's the reasoning of the rest of the ecosystem going with ENV variables?

mhauru commented 3 weeks ago

This is a really good argument 👍 You could of course implement the same parsing of an ENV variable, but I very much agree that it's much, much more natural to do so using ARGS.

Yeah, I think parsing complicated env vars gets awkward. You could define two env vars though, INCLUDE_TESTS and EXCLUDE_TESTS, both of which could be space separated lists. I don't know if that would satisfy the desire of consistency with SciML practices any more though.

I that the the docs mention that this is a use-case for args, but does it specifically recommend that over ENV variables?

No, I don't think it says anything about environment variables.

mhauru commented 3 weeks ago

The test failures seem unrelated to this PR. They seem to rather be about the recent ADTypes compat change interacting in funky ways with Optimization.jl version bounds when using Julia 1.7. I haven't gotten to the bottom of it yet.

devmotion commented 3 weeks ago

In other words, what would be the env var equivalent of doing

Probably something similar to JULIA_LOADPATH etc. where a single environment variable is split into a vector of paths/arguments based on a delimiter such as : (using e.g. split(ENV["..."], ':')).

Generally, I haven't encountered the need for such complex inclusion/exclusion of tests yet, so I'm a bit skeptic whether the additional complexity (both for users and developers) is worth it. Having the possibility to run all tests (typically the default but not used in CI) and pre-defined groups of tests has been sufficient in all packages I have worked with. For running even smaller chunks of tests or subsets of test files in local development in my experience typically the most convenient approach is to use TestEnv anyway.

yebai commented 3 weeks ago

Can we parse/extract test options in both TestEnv and ARGS at the start of runtests.jl and push them into an array of test configurations? This way, we support both, adhering to the TestEnv workflow and allowing users to use additional options only available in ARGS. This should be a relatively simple block of code, so I don't expect it will increase developer mental overhead too much.

mhauru commented 3 weeks ago

Can we parse/extract test options in both TestEnv and ARGS at the start of runtests.jl and push them into an array of test configurations?

Yep, we could definitely have both available. Have e.g. the ARGS parsing implemented as in the current version of this PR and append to the list of included tests the value of an env var called TURING_SELECT_TEST. Would everyone be happy with that?

Regardless of ARGS/env vars choice, I would still like to stick to tests being selected by file path rather than by manually grouping them. Benefits are

Consequently calling the env var GROUP doesn't feel right, even though that seems to be what other SciML packages do.

torfjelde commented 3 weeks ago

For running even smaller chunks of tests or subsets of test files in local development in my experience typically the most convenient approach is to use TestEnv anyway.

So I've also done the same, but having had experience with testing in other languages, I also do sometimes miss the ability to simply filter tests based on their name or whatever. It's particularly nice to do when you're working on bugfixes, since then there's a particular sub-test-suite you want to run, which then usually requires copy-pasting to another file, using TestEnv.jl, etc.

Can we parse/extract test options in both TestEnv and ARGS at the start of runtests.jl and push them into an array of test configurations?

Both also seems non-ideal, no? :confused: I'd rather have to do that (push onto ARGS) manually than us having to support both.

yebai commented 3 weeks ago

Both also seems non-ideal, no? 😕 I'd rather have to do that (push onto ARGS) manually than us having to support both.

I'm happy only to support ARGS approach from my side: I think the ARGS approach has fairly sensible mental overheads, given its similarity to standard terminal commands.

torfjelde commented 3 weeks ago

I'm also slowly coming to terms with the ARGS approach. In hindsight, I'd much prefer to have tried this out in another package, e.g. Bijectors or something, which is "less mission critical". But given the fantastic work @mhauru has done here, it might be worth just giving it a go in Turing.jl :thinking:

yebai commented 3 weeks ago

@mhauru, feel free to merge this PR when you are ready.

mhauru commented 3 weeks ago

Given @yebai and @torfjelde have come around to args and @devmotion said earlier that he's not too bothered if others are happy with this, I'll go and merge. Everyone seems to agree that this is regardless an improvement over the current situation, and if people miss having a SciML-style GROUP env var we can easily add it by building on the mechanisms introduced here.

torfjelde commented 3 weeks ago

For other people who are lazy: you might find it useful to add a file called, say, test/testrunner.jl which contains the following:

using Pkg
Pkg.test("Turing", test_args=ARGS)

then you can do

julia --project test/testrunner.jl a b c -skip d e
torfjelde commented 3 weeks ago

Even better, in general you can do:

using Pkg
Pkg.test(Pkg.project().name, test_args=ARGS)

though Pkg.project is apparently an experimental feature.

yebai commented 3 weeks ago

@torfjelde can you add this tip to the following docs page: https://turinglang.org/docs/tutorials/docs-01-contributing-guide/#tests

torfjelde commented 3 weeks ago

Question is whether we should just have a small script test/testrunner.jl in the actual repo that does this? Almost seems worth it, no?

mhauru commented 3 weeks ago

This seems like a common problem for many Julia packages, is there no standard solution to this? It feels frustrating that julia --project test/runtests.jl -- a b c --skip d e so nearly works. The only reason it doesn't is that it doesn't create a TestEnv.

devmotion commented 3 weeks ago

I'd suggest running

julia --project=. -e 'import Pkg; Pkg.test(; test_args=ARGS)' -- a b c --skip d e

if you don't work in a Julia REPL and want to execute the tests.

yebai commented 3 weeks ago

@devmotion, it looks very neat -- I added it to the docs: https://turinglang.org/docs/tutorials/docs-01-contributing-guide/#tests