CONP-PCNO / conp-dataset

:open_file_folder: A DataLad dataset for CONP
http://conp.ca
MIT License
19 stars 33 forks source link

Using ARKs for version control, initial draft proposal #887

Open emmetaobrien opened 5 months ago

emmetaobrien commented 5 months ago

Intended functionality:

1) a dataset version changes when the data provider specifies it should. 2) a new ARK is generated when dataset version changes. 3) the ARKs of previous versions link to the appropriate GitHub commit.

System needs to support the following use cases:

A) Dataset is updated with new scientific content. B) Technical updates (e.g. extra properties added to DATS.json for CONP internal purposes) that make no difference to scientific content of dataset. C) Removal of data from all versions of dataset (e.g. patients withdrawing from a study).

Proposed implementation:

Define somewhere to store archived ARKs for each dataset. Could be something like extraProperties=>archivedVersions in the DATS.json, an additional config file, or some other location. Subsequent text refers to this as archivedVersions, this is a placeholder.

Handling case A:

This function is triggered when a user changes an existing value in the version field. (I believe population of the version field on initial data submission can remain as currently implemented; some CONP datasets refer to concluded projects and no further updates are envisioned.)

workflow:

interface changes:

Case B will be carried out by CONP developers on a case-by-case basis and is generally not envisioned as updating versioning at this time.

Case C requires retroactive adjustments to all versions of a dataset that we serve. @samirdas suggested that removing the data from the underlying dataset sufficed, though this will leave broken links and error messages in a Datalad download.

This is very preliminary, feedback much appreciated.

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.