OpenSenseAction / poligrain

Simplify common tasks for working with point, line and gridded sensor data, focusing on rainfall observations.
https://poligrain.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2 stars 10 forks source link

Add functionality to download small and large example datasets #56

Open cchwala opened 2 weeks ago

cchwala commented 2 weeks ago

Is your feature request related to a problem? Please describe.

Currently we either use curl -OL some_url to download example data in the notebooks or we use data from the tests/test_data directory. Both are not ideal, in particular not if we want to add more datasets and also make access to some large ones easy.

Describe the solution you'd like A clear and concise description of what you want to happen.

I suggest to add a module src/poligrain/data.py or example_data.py or maybe datasets.py which would contain functions that could be used like this: get_example_data('AMS PWS small') orget_example_data('AMS PWS full').

As a first step, we would add very small example datasets (max. some hundred KB) to the repo that would be accessed by the function. Later we could add a separate repo for small example data.

In addition we can download full datasets directly from zenodod. Some data transformation would be required, though. There was preliminary work on this e.g. in the ragali_prototpy here (which is more or less 100% copy-paste of the same code in the OPENSENSE software sandbox, if I recall correctly). Hence, I would leave the access to full datasets for later.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Just continue adding data to the repo... ;-). But since we are building a package that should serve as the basis for other packages it should not be bloated by larger binary data that is packaged into the pypi installs. If it is not package into the pypi install it does not help the user much, since our examples are not easily executable.

cchwala commented 2 weeks ago

When the proposed implementation is done, the existing example data in the repo has to be consolidated. There is data in the tests dir and I am about to add new example data in the notebooks dir in #41 because I do not want to hardcode the paths for getting data from the tests dir.