bmir-radx / radx-data-dictionary-specification

A specification for CSV data dictionaries in the RADx data hub
BSD 2-Clause "Simplified" License
0 stars 2 forks source link

Add columns to reference and describe the provenance of CDEs #6

Closed matthewhorridge closed 11 months ago

matthewhorridge commented 1 year ago

It's generally the case that each field in a data dictionary corresponds to a Common Data Element (CDE). We need some extra columns to provide a reference to the CDE, the source of the CDE etc.

Cc @pwrose and @graybeal

graybeal commented 1 year ago

Reference to the CDE should be required to be an identifier (an IRI/URL, since if it's a registered CDE it should have a unique identifier that is resolvable). Possibly that just the @id of the referenced item.

I think there's a useful pattern for the other provenance information in the DataCite description patterns, this is just a particular digital object that needs provenance. But I understand not everyone likes the DataCite description patterns.

pwrose commented 1 year ago

The RADx-rad data dictionary has a "CDE Reference" column which is a list of values that is used in two ways:

  1. Origin (creator) of the CDE. Current values are: RADx-rad Minimum CDE (for the 46 min. CDEs), RADx-rad DCC (for almost all other CDEs), NWSS_DCIPHER_Data_Dictionary_v2.0.0_20210319 (wastewater CDEs from the CDC), RADx-UP Testing Core (for COVID test results)
  2. Additional references and resources related to a data element. These reference are all URLs, e.g., https://www.ncbi.nlm.nih.gov/taxonomy, https://cov-lineages.org/

Here are some examples (the origin is the first element, followed by zero or more references):

RADx-rad DCC|https://www.ncbi.nlm.nih.gov/taxonomy
RADx-rad DCC|https://www.uniprot.org/
RADx-rad DCC|https://www.cdc.gov/coronavirus/2019-ncov/vaccines/different-vaccines/overview-COVID-19-vaccines.html?s_cid=11758:covid%20vaccine%20brands:sem.ga:p:RG:GM:gen:PTN:FY22

Since the CDE Reference field is used for two related but different purposes, one could consider splitting this into two fields.

pwrose commented 1 year ago

@matthewhorridge you completed this issues with the new fields: Provenance and SeeAlso.

You can close this issue.