Samplers output standardization

LAAC-LSCP / ChildProject

Python package for the management of day-long recordings of children.

https://childproject.readthedocs.io

MIT License

13 stars 5 forks source link

Samplers output standardization #148

Closed lucasgautheron closed 3 years ago

lucasgautheron commented 3 years ago

Is your feature request related to a problem? Please describe.

Goals:

We should be able to traceback each sample to the parameters that help generate it, and the version of the package that was used.
The samples themselves should be standardized
Ideally, it should be possible to deliver the samples and their parameters with the dataset...

Describe the solution you'd like

For the storage:

We could use a nested tree just like w/ annotations :

samples / 
   vetted/
       segments.csv
   high-volubility/
       lena/
           segments.csv
           parameters.csv
       vtc/
           segments.csv
           parameters.csv
    energy1
        segments.csv
        parameters.csv

alecristia commented 3 years ago

this assumes the name is interpretable, right? wouldn't it make more sense to --yes-- give it a name, but either append parameters to the name and/or add a random or sequential number (eg random2 the second time I generate a random sample), and/or include the yaml of the parameters in the same folder?

lucasgautheron commented 3 years ago

Well, we could save the runs under samples/<name>/parameters_%Y-%m-%d_%H_%M_%S.csv and samples/<name>/segments_%Y-%m-%d_%H_%M_%S.csv, that would do the trick right ?

alecristia commented 3 years ago

yes, naming files in that way ensures that there is no over-writing, and connects parameters to data. It also allows for errors in our judgment in terms of the parameters we may have now versus in the future, as well as "harvesting" of data & parameters by using csvs for both (an alternative would have been yaml for the parameters, but they sound equivalent to me in this case).