alan-turing-institute / deepsensor

A Python package for tackling diverse environmental prediction tasks with NPs.
https://alan-turing-institute.github.io/deepsensor/
MIT License
72 stars 15 forks source link

Add unit tests for data getters #89

Open kallewesterling opened 11 months ago

kallewesterling commented 11 months ago

Currently, there are no tests for the data getters (in deepsensor.data.sources).

Suggestion: Add some tests with a very small download for each one source.

This is up for debate: The auxiliary files download a few gigabytes of data so a full unit test for each download is likely not doable. Should we just unit test the era5 and station ones? Or do some other kind of slicing of the downloads?

tom-andersson commented 4 months ago

This sounds reasonable, though I’m not 100% convinced about testing the download itself. Assuming we can write fast unit tests for the ERA5 and station getters, these tests would add a dependency on internet connection for the CI and would be flakey if those sources go down.

If we just want to test the interface of the getters and non-download logic then I’d prefer if we added a dry_run option to the getters that rather than running the download, instead generates empty data that looks the same.

@davidwilby any thoughts?

davidwilby commented 4 months ago

I think that in a test suite it's inadvisable to actually download anything in the routine running of tests (e.g. in CI, or in routine development) for the reasons you mention @tom-andersson but that it's definitely good to test both the downloader and also the source from which downloads are made, in some automated way.

The approach I've taken previously in similar situations is to:

tom-andersson commented 2 months ago

Thanks @davidwilby, I'll leave this issue open in case anyone wants to pick it up from your suggestions.