Add unit tests for data getters

kallewesterling commented 11 months ago

Currently, there are no tests for the data getters (in deepsensor.data.sources).

Suggestion: Add some tests with a very small download for each one source.

This is up for debate: The auxiliary files download a few gigabytes of data so a full unit test for each download is likely not doable. Should we just unit test the era5 and station ones? Or do some other kind of slicing of the downloads?

tom-andersson commented 4 months ago

This sounds reasonable, though I’m not 100% convinced about testing the download itself. Assuming we can write fast unit tests for the ERA5 and station getters, these tests would add a dependency on internet connection for the CI and would be flakey if those sources go down.

If we just want to test the interface of the getters and non-download logic then I’d prefer if we added a dry_run option to the getters that rather than running the download, instead generates empty data that looks the same.

@davidwilby any thoughts?

davidwilby commented 4 months ago

I think that in a test suite it's inadvisable to actually download anything in the routine running of tests (e.g. in CI, or in routine development) for the reasons you mention @tom-andersson but that it's definitely good to test both the downloader and also the source from which downloads are made, in some automated way.

The approach I've taken previously in similar situations is to:

include automated tests which use mocking to test the source code itself without actually downloading from the external data source. These tests can be included in the normal test suite.
also have a separate set of tests that check that the external data sources are functioning as expected. These tests should be excluded from the normal suite of tests, but can be run explicitly in CI on a schedule, to give us early warning about changes made to external data sources which can break deepsensor functionality.

tom-andersson commented 2 months ago

Thanks @davidwilby, I'll leave this issue open in case anyone wants to pick it up from your suggestions.

alan-turing-institute / deepsensor

Add unit tests for data getters #89