Improve runtime of Algorithm Analysis workflow

etpeterson commented 8 months ago

Feature description

The Algorithm Analysis workflow parallelizes the jobs in a matrix form, however due to an imbalance in processing duration there is significant wasted time.

Describe the solution

The Algorithm Analysis workflow uses the matrix job parallelization. This is nice and easy to use, but the longest jobs take 15 minutes and the shortest take seconds, and furthermore all jobs take more than a minute to set up the environment properly. This means that for a lot of the shorter runs, the vast majority of the time is spent setting up the environment.

The solution requires some investigation, but it probably involves timing the algorithms and finding some groups with reasonably similar execution time. From there, using a test array that references the manually curated groups should allow for faster wall time test execution. There may be other options that work as well, this issue involves some research.

Describe alternatives

No response

Additional context

No response

Are you working on this?

None

AhmedBasem20 commented 8 months ago

Hey @etpeterson, Since I'm already exploring this workflow, may I tackle this one? Thanks.

etpeterson commented 8 months ago

@AhmedBasem20 you're welcome to take a look. This is one where it's currently working so it's an enhancement that should be possible. That said, if it's much too complicated, it's probably not worth changing. I also don't have a concrete idea of how it should look.

Here's strategies I've considered but not followed through on.

Like I mentioned in the original post, better dividing up the jobs. The matrix is currently algorithm and signal to noise (SNR). That could change to SNR and anatomic area, perhaps. Or even grouping a few algorithms together in a single run. This surely is possible but also might get complicated. For example, if more algorithms are added, how do we re-group them into similarly timed tranches?
Reusing runners. The startup time is >1minute so if we can reuse them, we're saving that startup cost. This does mean the tests aren't being truly run from a clean slate so there are plusses and minuses.
Starting long-running algorithms first. We know the slow ones so if we start those running first they'll be running the entire time and the short running ones can run in parallel.

I think these could all be tried together or only one. Keep in mind that something is better than nothing and if at the end you make some enhancements but have a plan for more you can do that or write another issue for someone else.

Another note is that in the output of this there are timings being saved so there's already information on how fast each algorithm is. And the tests could change a little too, there's nothing holding us to exactly these parameters if there's some optimization to be done there.

AhmedBasem20 commented 8 months ago

Thanks @etpeterson for breaking this down! I'll open a pull request with the specified subtasks and work on each one individually.

OSIPI / TF2.4_IVIM-MRI_CodeCollection