Limited parallelism with many modules

As a user, I might want to split my tests up into multiple files, corresponding to separate modules. I would do this for separation of concerns, better organization, and reduce polluting the global namespace (e.g. with shared data or set-up/tear-down associated with several specific testsets). However, ReTest discourages this practice due to its current model of parallelism.

Consider the following example:

# runtests.jl

using Distributed

N_WORKERS = 2

addprocs(N_WORKERS, exeflags="--project=$(Base.active_project())")

# Propagate importing ReTest across all workers
@everywhere using ReTest

# Assume that each module_x.jl defines a `ModuleX` in the global namespace.
@everywhere include("path/to/module_1.jl")
@everywhere include("path/to/module_2.jl")
...
@everywhere include("path/to/module_M.jl")

# Call ReTest from the main process, and dispatch the workers across the testsets.
retest(Module1, Module2, ..., ModuleM; ...)

If M=100, and each ModuleX defined a single testset, the user would reasonably assume that ReTest would achieve reasonable parallelism and approximately halve the time taken due to having 2 workers. This is not the case, as ReTest only dispatches the workers across the testsets of a single module. This then means that enabling parallelism would have absolutely no effect at all, and due to overhead would simply be slower. On top of that, shuffling would also have no effect, as it also only works per-module, and so any hidden dependencies between modules would never be detected. This code block in src/ReTest.jl in the source confirms this:

for imod in eachindex(modules)
        mod, pat = modules[imod]
        tests = alltests[imod]
        ...
        shuffle &&
            shuffle!(tests)
        ...
        # TODO: move printer task out of worker?
        worker = @task begin
        ...

While this is fine and can be worked around (e.g. consolidating tests into a single file), this seems like an unnecessary restriction that the library imposes. Instead, it would be better if all of the testsets across all modules were gathered into a single global set that can then be divided across the workers. In this model of parallelism, the previewer task would work in a similar way to as it does currently, going through the testsets in the order that they appear to retest. Instead of having the workers follow in lockstep, they would simply return the results to the previewer process as and when they complete their testsets. When the previewer reaches a testset and the result is there, it immediately evaluates it and displays it. If it doesn't, it waits on the result until one of the workers completes it.

This has several advantages:

It improves parallelism in the case of many test modules and files
Parallel performance and good design through separation of concerns are no longer mutually exclusive goals
Shuffling can occur across modules as opposed to only per-module, greatly increasing the number of possible combinations
More opportunities for optimizing load-balancing are available, assuming the number of modules > 1
Developers that use a pattern like a single include("path/to/all_my_tests.jl") can continue to do so

I'd be happy to contribute towards this, as this seems like a significant improvement in flexibility for the developer and would go a long way towards making this a package that can be used for production.

JuliaTesting / ReTest.jl

Limited parallelism with many modules #46