EDIorg / EMLassemblyline

R package for creating EML metadata
https://ediorg.github.io/EMLassemblyline/
MIT License
28 stars 13 forks source link

Provenance tracking: Add methods for EDI data packages or external sources #8

Closed clnsmth closed 3 years ago

clnsmth commented 6 years ago

Support provenance tracking of data sources located in the Environmental Data Initiative (EDI) repository as well as external sources. Provenance EML can be harvested via the PASTA+ API if the source data package is known. Required fields of a provenance node can be found in this example. Perhaps a template file is the best method for collecting provenance info of an external source?

clnsmth commented 5 years ago

One possibility is developing the function template_provenance() with the arguments:

path (character) Path to the metadata templates directory where the provenance.txt table will be written system.id (character) Identifier of the system(s) in which the data object resides. Default is NULL. package.id (character) Identifier of the data package(s) in the corresponding system.id. Default is NULL.

... and behavior:

Example provenance.txt

Fields and definitions: systemID System in which the data reside packageID Data package identifier title Data package/resource title givenName Given name surName Sur name role Role (expected is a creator and contact) organizationName Organization name email Email address onlineDescription Description of the online distribution url URL of the data resource

cgries commented 3 years ago

Based on a specific dataset: The EDI metadata template has a table for documenting provenance that asks for: dataset title, URL or DOI, creator (name and e-mail), contact (name and e-mail). So, looking at the suggested fields I am not sure what would go into systemID or packageID, since the only 'ID' is the URL for most of the outside datasets. Many don't even have an URL/DOI, if they were just provided by the author of a paper. Some of the creators are organizations and none seems to have an e-mail ....

clnsmth commented 3 years ago

Thanks for this info @cgries. As you suggest, the url field makes systemID and packageID redundant, but can be used as function arguments to auto populate this template.