Open amberleahey opened 5 months ago
Very interesting. Over in this pull request...
... we are adding a new export format called Croissant that is geared toward machine learning. Perhaps the Croissant exporter (which has its own git repo at https://github.com/gdcc/exporter-croissant ) can use some of the fields from the new machine learning metadata block when its available.
For more on Croissant, please see its website at https://github.com/mlcommons/croissant , some unmerged docs I wrote for the pull request above, and the threads I started on the google group and Zulip.
Overview of the Feature Request This new metadata block would be optional and would include new metadata fields for describing machine learning research data , such as ML models, tasks, sources, etc.
What kind of user is the feature intended for? Depositors, curators, administrators managing machine learning datasets and collections. Users can describe the "number of instances", "task type", and other technical details related to machine learning models and programs.
What inspired the request? This is inspired by Stephanie Labou's research about machine learning datasets in Harvard Dataverse and other repositories around the world. https://zenodo-rdm.web.cern.ch/records/11269191 slide 11 -
What existing behavior do you want changed? Add new metadata block with standardized fields for describing machine learning data
Any brand new behavior do you want to add to Dataverse? Add new metadata block with standardized fields for describing machine learning data
Any open or closed issues related to this feature request?