brainglobe / brainglobe-workflows

Workflows that utilise BrainGlobe tools to perform data analysis and visualisation.
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Run cellfinder benchmarks on small data #94

Closed sfmig closed 2 months ago

sfmig commented 2 months ago

Description

What is this PR

Why is this PR needed? We are exploring a systematic way to benchmark brainglobe workflows using asv.

This PR fixes some issues running the cellfinder workflow benchmarks (1) on a small GIN dataset and (2) on data available locally.

What does this PR do? This PR involves:

To run the benchmarks locally on small dataset from GIN

  1. Checkout this branch to get the latest version of the benchmarks locally.

  2. Create a conda environment and pip install asv:

    conda create -n asv-check python=3.10
    conda activate asv-check
    pip install asv

    Note that to run the benchmarks you do not need to install a development version of brainglobe-workflows, since asv will create a separate Python virtual environment to run the benchmarks on it. However, for convenience we do include asv as part of the dev dependencies, so you can use a dev environment to run benchmarks.

  3. For a quick check, run one iteration per benchmark with

    asv run -q
    • You can add -v --show-stderr for a more verbose output.
    • This will install in the asv virtual environment the brainglobe-workflows package from the tip of the local currently checked out branch, and run the (locally defined) benchmarks on it.

To run the benchmarks (locally) on a locally available dataset

  1. Define a config file for the workflow to benchmark. You can use the default one at brainglobe_workflows/configs/cellfinder.json for reference.

    • Ensure your config file includes an input_data_dir field pointing to the data of interest.
    • Edit the names of the signal and background directories if required. By default, they are assumed to be in signal and background subdirectories under input_data_dir. However, these defaults can be overwritten with the signal_subdir and background_subdir fields.
  2. Create and activate an environment with asv (follow steps 1 and 2 from above).

  3. Run the benchmarks in "quick mode", passing the path to your config file as an environment variable CONFIG_PATH. In Unix systems:

    CONFIG_PATH=/path/to/your/config/file asv run -q

Troubleshooting

You may find that the conda environment creation is failing because of this issue. This seems to be because asv is assuming a conda syntax that changed with the latest release (in conda 24.3.0 --force became --yes).

A PR is on the way, as a temporary workaround you can try from base conda install -y "conda<24.3".

References

See issue #9.

Also related is issue #98 which I am currently investigating.

Further context

We currently have asv benchmarks for the three main steps involved in the cellfinder workflow:

We also have a benchmark for the full workflow.

We envisioned benchmarks being useful to developers in 3 main ways:

A reminder of how asv works:

How has this PR been tested?

The benchmarks are checked with a CI job, rather than with explicit tests. This follows the general approach in the field - see #96 for more details.

Since we don't plan to test the benchmarks with pytest, I omitted the benchmarks from coverage.

Is this a breaking change?

No.

Does this PR require an update to the documentation?

The README has been updated to better reflect the current status.

Checklist:

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 84.45%. Comparing base (b5f62ef) to head (f800053).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #94 +/- ## ========================================== + Coverage 79.38% 84.45% +5.06% ========================================== Files 18 17 -1 Lines 917 862 -55 ========================================== Hits 728 728 + Misses 189 134 -55 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.