LAAC-LSCP / ChildProject

Python package for the management of day-long recordings of children.
https://childproject.readthedocs.io
MIT License
13 stars 5 forks source link

Documenting datasets #207

Closed lucasgautheron closed 2 years ago

lucasgautheron commented 3 years ago

Is your feature request related to a problem? Please describe.

Datasets always need to be documented. Documentation may include information about:

Ideally, some of the documentation should be machine-readable in order to improve discoverability. Machine-readability may also be exploited by DataLad's metadata extractors.

For instance, GIN uses the datacite scheme (using YAML), which is used to generate the DOI and the metadata associated to it: https://gin.g-node.org/G-Node/Info/wiki/DOIfile#creating-a-datacite-metadata-file.

The variables can also be documented using machine-readable formats. The most obvious candidates are CSV, YAML, or XML. However, it is likely that some of these information won't fit in rigid structures. We should encourage people to use formats such as Markdown rather than docx maybe for such information...

Describe the solution you'd like

alecristia commented 3 years ago

this sounds great, and I second the solution.

I also wonder whether we want to add something nobody includes but will be increasingly necessary, I think: the proof of ethical permission for the data collection & sharing, and a sample consent form. That will definitely not be machine-readable for now.

How about contact information for the authors? For EL1000, see table under this header. Author contact info should stay with the data, and can change too (eg if someone retires)

lucasgautheron commented 3 years ago

Regarding authorship, can we use this format ? https://gin.g-node.org/G-Node/Info/wiki/DOIfile#creating-a-datacite-metadata-file

On GIN, once this file has been created, informations will show at the bottom of the repository main page, see here for instance: https://gin.g-node.org/LAAC-LSCP/managing-storing-sharing-paper