Open cchwala opened 2 weeks ago
When the proposed implementation is done, the existing example data in the repo has to be consolidated. There is data in the tests dir and I am about to add new example data in the notebooks dir in #41 because I do not want to hardcode the paths for getting data from the tests dir.
Is your feature request related to a problem? Please describe.
Currently we either use
curl -OL some_url
to download example data in the notebooks or we use data from thetests/test_data
directory. Both are not ideal, in particular not if we want to add more datasets and also make access to some large ones easy.Describe the solution you'd like A clear and concise description of what you want to happen.
I suggest to add a module
src/poligrain/data.py
orexample_data.py
or maybedatasets.py
which would contain functions that could be used like this:get_example_data('AMS PWS small') or
get_example_data('AMS PWS full').As a first step, we would add very small example datasets (max. some hundred KB) to the repo that would be accessed by the function. Later we could add a separate repo for small example data.
In addition we can download full datasets directly from zenodod. Some data transformation would be required, though. There was preliminary work on this e.g. in the
ragali_prototpy
here (which is more or less 100% copy-paste of the same code in the OPENSENSE software sandbox, if I recall correctly). Hence, I would leave the access to full datasets for later.Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Just continue adding data to the repo... ;-). But since we are building a package that should serve as the basis for other packages it should not be bloated by larger binary data that is packaged into the pypi installs. If it is not package into the pypi install it does not help the user much, since our examples are not easily executable.