Metadata - Githubissues

mankoff commented 3 years ago

Datalad supports metadata: http://docs.datalad.org/en/stable/metadata.html

Which format should we use? Pro's and cons here...

For @jmlea16 and all.

AdrienWehrle commented 3 years ago

Also, how should we handle the metadata collection? A combination of manual entries and more automated extractions through e.g. metadata extractors? Automation is nice, but we need to stick to the chosen convention.

mankoff commented 3 years ago

I'd love to automate this, but we as a community are not providing the right metadata to do so. I think we'll need to do it manually. Given that we'll start with 5 or 25 datasets, that's easy to do. Even if we grow to a few 100, manual is solvable at the dataset level. It would be nice if 3rd parties provided sufficient file-level metadata that we could fetch, for example, specific velocity maps based on date or time, or Landsat scenes based on cloud cover %. But for now, just data-set level stuff.

What do we want to track?

[ ] Source data: DOI, reference, URL
[ ] Science reference & DOI
[ ] Organization & project
[ ] Geospatial ROI.
- [ ] Is this names? Antarctica, Greenland, Leverett Glacier, Sermeq Kujalleq, Swiss Camp, UPE_U, QAS_L
- [ ] Is this a polygon? Following ISO 19107? https://www.iso.org/obp/ui/#iso:std:iso:19107:en ?

Maybe keywords from a controlled vocabulary, for example, "velocity" "biology" "ice" "ocean" "atmosphere" "temperature" etc.?

AdrienWehrle commented 3 years ago

Why not JSON? Or any other human-readable object notation. We could have nested objects like for the ROI: the name would be the standard field to add, then one could nest another object into it for the polygon.

cryo-data / discuss

Metadata #6