Closed gothub closed 7 years ago
So this new function will be in the R DataONE package, right?
Yes, that is the plan, unless someone has reasons why it should be in another package.
I think this would be best in datapack::DataPackage
, as that is the container for the RDF and all of the components. It especially makes sense as x
is of class DataPackage
, and would allow the provenance info to be added to the package just like the existing statements are, and is similar to the existing datapack::insertRelationship()
method. datapack
is already RDF and provenance aware. This new method would be a higher-level version of that, but would insert multiple relationships at once.
Are inputFiles
and outputFiles
intended to be vectors of files, vectors of identifiers, or both? We definitely need to be able to do it via vectors of identifiers. If so, maybe the parameters should be renamed to inputIdentifiers
and outputIdentifiers
? If they are files, does the function add the files to the package as well? Needs discussion.
Yes, datapack
stores the prov relationships, but I don't think it should contain the knowledge of the ProvONE data model, i.e. an execution is linked to a plan via a qualified association. etc. If the data model changes then dataone
has a dependency on datapack
to change.
Regarding the inputFiles
and outputFIles
- this function is handling the use case where a user has a collection of files that are the artifacts of an execution that has already run. The function would be used to build a DataPackage from the scripts, input and output files for such a run, so it would take care of assigning pids to DataObjects.
Maybe there are other use cases that we need to consider.
This functionality was added to datapack
in commit 9afb17209134d5b5e4a3d7061daed333835f86ac.
As described in https://github.com/DataONEorg/sem-prov-design/issues/228, a function will be added that will insert provenance relationships into a DataPackage for a script and the files that it has read and written. The proposed call:
with parameters:
The function will insert the provenance relationsships that are required by the DataONE
RDF/XML
indexing subprocessor in order for the prov relationships to be properly indexed.Where should this function be placed in the DataONE package? It doesn't really fit in
D1Client.R
, so do we need a new S4 class?It doesn't quite make sense to place this in the R
datapack
package either - we are modifying a DataPackage here, but these are DataONE provenance relationships that are being added, which the R DataONE package should have knowledge of, notdatapack
.