Add benchmarks for the main function [Feature]

sfmig commented 1 year ago

Add a benchmark for the main function (detect+classify)

For it, try fetching larger data (larger than the test data in the repo, but not as large as a 100% realistic scenario) until the benchmarking time is reasonable:

increase the size of the data in benchmarks gradually
use GIN and fetch with pooch - this PR may be a good example to follow
smallest dataset that is realistic is 100GB
typically 10mins?

adamltyson commented 5 months ago

To test with the largest data, could these workflows not be run on our internal runner, and the data live there permanently?

sfmig commented 5 months ago

yes @adamltyson, that is exactly the plan.

This issue was migrated from the cellfinder repo, where we initially started the benchmarking work following a "modular" approach (i.e., benchmarking individual functions, rather than a workflow). But these comments are slightly outdated now. I think the comments on the size of the data came from a discussion on how to determine what is a small/large dataset.

I will close this issue now since:

adding a benchmark for the main function (detect+classify) is now complete (see PR #94), and
we now know what we mean with a small dataset and large dataset.

adamltyson commented 5 months ago

Ignore me, I got a notification about this issue (because it was transferred), and thought it was a new issue!

brainglobe / brainglobe-workflows

Add benchmarks for the main function [Feature] #105