Run cellfinder benchmarks on small data

Description

What is this PR

[ ] Bug fix
[x] Addition of a new feature
[ ] Other

Why is this PR needed? We are exploring a systematic way to benchmark brainglobe workflows using asv.

This PR fixes some issues running the cellfinder workflow benchmarks (1) on a small GIN dataset and (2) on data available locally.

What does this PR do? This PR involves:

edits on the asv config file (mainly to install in the asv environment the brainglobe-workflows package from the local repo),
an update to the setup_cache function of the benchmarks,
an option to run the benchmarks on locally available data using an environment variable, and
edits to the readme file to reflect these updates.

To run the benchmarks locally on small dataset from GIN

Checkout this branch to get the latest version of the benchmarks locally.
Create a conda environment and pip install asv:
```
conda create -n asv-check python=3.10
conda activate asv-check
pip install asv
```
Note that to run the benchmarks you do not need to install a development version of brainglobe-workflows, since asv will create a separate Python virtual environment to run the benchmarks on it. However, for convenience we do include asv as part of the dev dependencies, so you can use a dev environment to run benchmarks.
For a quick check, run one iteration per benchmark with
```
asv run -q
```
- You can add -v --show-stderr for a more verbose output.
- This will install in the asv virtual environment the brainglobe-workflows package from the tip of the local currently checked out branch, and run the (locally defined) benchmarks on it.

To run the benchmarks (locally) on a locally available dataset

Define a config file for the workflow to benchmark. You can use the default one at brainglobe_workflows/configs/cellfinder.json for reference.
- Ensure your config file includes an input_data_dir field pointing to the data of interest.
- Edit the names of the signal and background directories if required. By default, they are assumed to be in signal and background subdirectories under input_data_dir. However, these defaults can be overwritten with the signal_subdir and background_subdir fields.
Create and activate an environment with asv (follow steps 1 and 2 from above).
Run the benchmarks in "quick mode", passing the path to your config file as an environment variable CONFIG_PATH. In Unix systems:
```
CONFIG_PATH=/path/to/your/config/file asv run -q
```

Troubleshooting

You may find that the conda environment creation is failing because of this issue. This seems to be because asv is assuming a conda syntax that changed with the latest release (in conda 24.3.0 --force became --yes).

A PR is on the way, as a temporary workaround you can try from base conda install -y "conda<24.3".

References

See issue #9.

Also related is issue #98 which I am currently investigating.

Further context

We currently have asv benchmarks for the three main steps involved in the cellfinder workflow:

reading input data,
detecting and classifying cells, and
saving the results to file.

We also have a benchmark for the full workflow.

We envisioned benchmarks being useful to developers in 3 main ways:

Developers can run the available benchmarks locally on a small test dataset fetched from GIN. For this, the cellfinder workflow is run with the default config that ships with the package (at brainglobe_workflows/configs/cellfinder.json).
Developers can also run these benchmarks on data they have stored locally. For this, the workflow is run with a custom config, whose path is passed to the benchmarks as an environment variable.
We also plan to run the benchmarks on an internal runner using a larger dataset, of the scale we expect users to be handling. The result of these benchmarks will be made publicly available. This is not yet implemented. This is all explained in the README.

A reminder of how asv works:

asv creates a virtual environment where it installs the package to be benchmarked (in our case, brainglobe-workflows). This virtual environment is defined in the asv config file (asv.conf.json).
We set asv so that the version of brainglobe-workflows that is installed in the asv-managed virtual environment is the one at the tip of the currently checked out branch (i.e., the version at HEAD). This way developers can check if their local branch introduces regressions. Alternatively, we can choose to install a version of brainglobe-workflows fetched from Github (for example, the tip of the remote main branch).
asv will look for benchmarks under the benchmarks folder (which is at the same level as the asv.conf.json file), and run them.

How has this PR been tested?

The benchmarks are checked with a CI job, rather than with explicit tests. This follows the general approach in the field - see #96 for more details.

Since we don't plan to test the benchmarks with pytest, I omitted the benchmarks from coverage.

Is this a breaking change?

No.

Does this PR require an update to the documentation?

The README has been updated to better reflect the current status.

Checklist:

[x] The code has been tested locally
[x] Tests have been added to cover all new functionality (unit & integration) -- this is covered in PR #96
[x] The documentation has been updated to reflect any changes
[x] The code has been formatted with pre-commit

brainglobe / brainglobe-workflows