DataONEorg / sem-prov-design

Design documents for the Semantics and Provenance Working Group, DataONE Phase II
Apache License 2.0
5 stars 3 forks source link

design database index for provenance information in Matlab and R #147

Open mbjones opened 9 years ago

mbjones commented 9 years ago

Matlab will need to index the provenance metadata generated during executions. This could be done in a text file, but will likely perform better if a database is accessible. If we decide to use a database for recording runs, it would be best if the Matlab and R clients shared a schema and approach.

Matlab has a database toolbox, but many users won't have access to it. One can use Java's JDBC access directly from within Matlab, so the database toolbox is probably not needed. There is a discussion of database access techniques on stack overflow, as well as a thread on opensource alternatives to the database toolbox. I think these would allow sqlite and postgres to be options.

Discuss these options with @csjx, @sbpcs59, and @sycao5 to reach a decision, and document the design.

sycao5 commented 9 years ago

Thanks for documenting this issue.

gothub commented 9 years ago

Both recordr and matlab-dataone are saving the same 'run metadata' information into text file which includes the fields:

runId, tag, software_application, startTime, endTime, publishedTime, packageId, errorMessage

which allows for searching for runs of interested and then viewing of the science data and provenance relationships for each run by retrieving these data that are storing in a directory with the unique runId as it's name, which is a sub-directory of the location indicated by the session configuration parameter _provenance_storagedirectory.

The run metadata file could be reimplemented as a database table with the runId as the primary key, and other fields indexed as appropriate. The schema for the database will be detailed in

https://github.com/DataONEorg/sem-prov-design/blob/master/docs/PROV-capture/Run-manager-API.rst

sycao5 commented 9 years ago

Hi Peter,

Thank you for your comments. I will check my code with the requirements.

Yang