IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
855 stars 480 forks source link

Feature Request/Idea: Create Machine Learning Metadata Block #10630

Open amberleahey opened 1 week ago

amberleahey commented 1 week ago

Overview of the Feature Request This new metadata block would be optional and would include new metadata fields for describing machine learning research data , such as ML models, tasks, sources, etc.

What kind of user is the feature intended for? Depositors, curators, administrators managing machine learning datasets and collections. Users can describe the "number of instances", "task type", and other technical details related to machine learning models and programs.

What inspired the request? This is inspired by Stephanie Labou's research about machine learning datasets in Harvard Dataverse and other repositories around the world. https://zenodo-rdm.web.cern.ch/records/11269191 slide 11 -

What existing behavior do you want changed? Add new metadata block with standardized fields for describing machine learning data

Any brand new behavior do you want to add to Dataverse? Add new metadata block with standardized fields for describing machine learning data

Any open or closed issues related to this feature request?

pdurbin commented 1 week ago

Very interesting. Over in this pull request...

... we are adding a new export format called Croissant that is geared toward machine learning. Perhaps the Croissant exporter (which has its own git repo at https://github.com/gdcc/exporter-croissant ) can use some of the fields from the new machine learning metadata block when its available.

For more on Croissant, please see its website at https://github.com/mlcommons/croissant , some unmerged docs I wrote for the pull request above, and the threads I started on the google group and Zulip.