Closed clnsmth closed 3 years ago
One possibility is developing the function template_provenance()
with the arguments:
path
(character) Path to the metadata templates directory where the provenance.txt table will be written
system.id
(character) Identifier of the system(s) in which the data object resides. Default is NULL.
package.id
(character) Identifier of the data package(s) in the corresponding system.id
. Default is NULL.
... and behavior:
path
is supplied, then a blank template is written.system.id
and package.id
pairing is supplied, then the function downloads and parses the relevant metadata into the fields described below and writes the template to path
.Fields and definitions: systemID System in which the data reside packageID Data package identifier title Data package/resource title givenName Given name surName Sur name role Role (expected is a creator and contact) organizationName Organization name email Email address onlineDescription Description of the online distribution url URL of the data resource
Based on a specific dataset: The EDI metadata template has a table for documenting provenance that asks for: dataset title, URL or DOI, creator (name and e-mail), contact (name and e-mail). So, looking at the suggested fields I am not sure what would go into systemID or packageID, since the only 'ID' is the URL for most of the outside datasets. Many don't even have an URL/DOI, if they were just provided by the author of a paper. Some of the creators are organizations and none seems to have an e-mail ....
Thanks for this info @cgries. As you suggest, the url field makes systemID and packageID redundant, but can be used as function arguments to auto populate this template.
Support provenance tracking of data sources located in the Environmental Data Initiative (EDI) repository as well as external sources. Provenance EML can be harvested via the PASTA+ API if the source data package is known. Required fields of a provenance node can be found in this example. Perhaps a template file is the best method for collecting provenance info of an external source?