Open juansensio opened 8 months ago
For Q0 datasets this can already be done.
For Q2+ datasets, we should enable a new item type in the specification which allows linking the source code as assets. @fmariv can take a look at this.
Versioning is automatically supported.
This sounds interesting. I think we should consider adding this new feature to any STAC object (Catalog, Collection, Items), depending on the case. A new ml-dataset:provenance
feature should be added, composed by STAC links objects that point the script or Notebook, both local or URL. We should also consider adding a new mediatype for STAC links, such as code
or script
or something similar.
Use cases:
ml-dataset:provenance
objects inside the Collections be also placed in the root Catalog? This may duplicate data, but give an holistic view of the catalog provenance.ml-dataset:provenance
should contain the title/name of the asset, just as a link and a wat to define It as part of the extension.
Posted by @dmoglioni
USER STORY - notebook/ingestion/timeseries/versioning/data provenance
A user codes a script or notebook that generates a datasets as an output (e.g. timeseries) and then wants to ingest both data and source code (versioned) in EOTDL.