fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
12 stars 5 forks source link

Explore using `pooch` for test datasets #598

Closed tcompa closed 8 months ago

tcompa commented 8 months ago

Current situation

  1. In the future we may expand our test-data sources (e.g. by adding BIA #536). If so, we'll need reasonably flexible download procedures and it'd be best to avoid repeated downloads from any CI run.
  2. Downloading data from Zenodo is currently based on custom functions, that requires us to know their API (with occasional problems like #597) and to use a bunch of requests/urrlib3 commands in our fixtures. These are mixed with actual fractal-tasks-core logic (e.g. a workaround to update the ROI tables), which obviously does belong to our CI. It's not always easy to separate the two logical components (downloading data, and preparing them for our needs).

Update proposal (not high priority)

An interesting option would be to use https://www.fatiando.org/pooch/latest/about.html. It abstracts the download of some datasets, and it stores them in a local cache, to avoid multiple downloads. Compared to our current approach, it also includes the checks to make sure the original files are there. In other words, it creates a local version of a bunch of datasets we may want to use for the CI or for examples.

Note that this would also play well with GitHub CI: we can make sure that the pooch folder is cached, so that downloads are never re-triggered (or very rarely).

Here are a few examples of how napari uses pooch in some examples: https://github.com/search?q=repo%3Anapari%2Fnapari+pooch+language%3APython&type=code&l=Python

And here is the same for scikit-image: https://github.com/search?q=repo%3Ascikit-image%2Fscikit-image+pooch+language%3APython&type=code&l=Python

jluethi commented 8 months ago

Sounds good! Recently heard the napari devs recommend pooch quite heavily in a napari workshop, so they seem to be happy with it