A suite of benchmarking tests for BrainGlobe

alessandrofelder commented 1 year ago

A key requirement for all BrainGlobe tools is that they can reasonably (say hours - 1 day) on PhD student laptops, to make the tools accessible for everyone. It is therefore important that we ensure that future changes to the code don't cause significant performance loss (especially considering https://github.com/brainglobe/cellfinder-core/issues/170 !)

Ultimately, we'd therefore like to have a suite of benchmarks that form part of the tests of all maintained BrainGlobe packages. A preliminary discussion in a developer meeting suggests airspeed-velocity/asv as a useful tool to achieve this (but this is not set-in-stone).

A naive, initial approach might be:

[ ] familiarisation with asv
[ ] prototype benchmark suite for one repo
[ ] move reusable parts to a bg_utils package (maybe a benchmarking submodule?)
[ ] import from bg_utils and re-use in other repos
[ ] also benchmark real-life use cases (on our internal GH runner)

Resources

Profiling tools that have come in handy before and may help understand what to benchmark/deepen general understanding of our code performance are

pyinstrument
memory_profiler

adamltyson commented 1 year ago

Just to flesh this out, I would say the ideal case initially would be to have:

For cellfinder-core

A job that runs in CI (I think parallel to the tests) running cell detection and classification on a large-ish 3D image (say something that takes ~5 mins?).
A larger job that runs on >1 whole-mouse brain (~200GB) that can run on our internal runners weekly (and can be scheduled manually).

For brainreg

A job that registers a small (e.g. downsampled) brain image to an atlas. We have this in the tests, but for a parallel benchmarking test it could maybe be higher resolution
A larger job registering full-resolution data to a number of resolutions of the Allen mouse brain atlas

All of these jobs should fail if run time or peak memory usage exceeds 10% (?) of some baseline (TBC).

Stretch goals

Also run cellfinder-core on GPU (e.g. on local cluster via internal GH actions runner)
Run whole-brain tests on a variety of different images (e.g. different numbers of labelled cells)
Run registration with different atlases/species

What do you think @alessandrofelder?

adamltyson commented 1 year ago

FYI - #23

If there's lots of shared benchmarking code it could live in bg-benchmarking, if there isn't much it could be bg-utils.benchmarking, perhaps with non runtime dependencies downloaded with pip install bg-utils[benchmarking]?

adamltyson commented 1 year ago

@sfmig one idea that isn't in this issue is to look at the timings from the original brainreg and cellfinder papers. The machines used to generate these benchmarks are still available, and we can run both the version released with the paper and the most recent to get a performance baseline.

sfmig commented 1 year ago

Some thoughts on benchmarking tools

After a bit of research I'd highlight these two:

Asv

"It works by saving (in a JSON file) the output of performance tests. The files are tagged with machine, number of cores, commit, etc. The idea is to save the JSON files over time to be able to compare run times. Visualization is supported. It works by creating a new virtual env, installing the software version of a given commit into it, and timing the [benchmarking] tests." -- from this comment
Disadvantages (from here)
- "tied to one specific use case, namely, running a suite of benchmarks across a history of commits for a project, and analysing the history of runtimes."
- "There's no concept of "fast benchmarks" which should run every time and "slow benchmarks" that should only run, e.g., on every release (similar to "fast" and "slow" tests)" ----> this could be a problem?
- asv is designed around "1. Identify performance regressions", but it is difficult to adapt it for other cases (e.g. we may want to do "5. Provide explicit targets of known important use-cases for optimization", see here)
- "the main problem here is that asv is mostly abandoned and has zero maintainers or contributors right now" ---last release Feb 2022; also mentioned in this comment "the tool seems not well maintained and not very flexible"
- in short, it seems a bit narrow in its package history focus and not very well maintained, but used because there is nothing better so far?
Other tools using it:
- sympy
- this neuroscience analysis tool

Pyperf

pyperf's main focus is to run individual stable benchmarks
- it has nice functionality like detecting if a benchmark result seems unstable and calibration
- it can run an individual benchmark (which apparently is hard with asv)
It has no support for:
- setting up a suite of benchmarks, ---> is this right tho? what about the Python benchmark suite?
- running benchmarks over a project history, or
- for setting up isolated testing environments (from here)
Who uses it?
- the Python benchmark suite
- yt-project

Other comments

pytest fixture also seems nice, mostly because it integrates with pytest but it seems a bit limited. It is built around benchmarking a function (so you may need to define wrapper functions for more complex things).

Suggested next steps

After the dev meeting today, we suggest going forward with asv as it mostly aligns with our aim of detecting performance regression, and seems to fit well with CI.
We may use pyperf for the longer running, more realistic benchmarks that we plan to run (with less frequency) on the SWC computer

sfmig commented 1 year ago

Profiling tools

We'd like to use profilers to identify bottlenecks, to narrow down what we need to benchmark. From having a look at the suggested tools here and in the top comment of the thread:

pyinstrument: for runtime profiling. We already have some benchmarks with it on cellfinder-core, so it seems reasonable to continue with it.
memory_profiler: for memory profiling, it is also part of those existing benchmarks. The tool is no longer actively maintained, but I can't find a good alternative - memray could be an option (it also has a pytest plugin) but it is not available for Windows. Leaving this on hold for now (we'll have some memory data from the asv benchmarks anyways).

alessandrofelder commented 1 year ago

https://github.com/brainglobe/cellfinder-core/pull/184/files#r1269597227 so we don't forget that we'd like/expect the benchmarking instructions to become more BG-specific in the future.

brainglobe / BrainGlobe