Closed kailingchen closed 1 month ago
Draft of standard MDF data model instance with these features added here.
Only diagnosis node has an ID field. I think other nodes like study, participant, sample, and file should have their ID fields like diagnosis node.
We do have id
under the mustHave
key in UniversalNodeProperties
, which indicates that each node will have a property matching the id
defined in PropDefinitions
. Is there also a need for a {node_handle}_id
field for each node? Or do you just need a property with Key: true
for each node? I could add that to id
I think we just need a property with Key: true for each node
Standard model updated and on main branch here: https://github.com/CBIIT/bento-mdf/blob/main/drivers/python/tests/samples/crdc_datahub_mdf.yml
Data Hub derives validation information such as CDEs, permissible values, and required properties, from data models that are described using the Model Description Format (MDF) that is used to describe data models for Bento installations. Data Hub also uses MDF to produce the data commons specific loading files that are produced as the end product of data submission
Once a submitter has started a submission with a specific version of a data model, they are allowed to continue using that version to complete their submission, regardless of any updates from the downstream data commons. This means that Data Hub will be supporting multiple versions of data models from multiple data commons. The QA burden for testing all these models and model versions is not sustainable.
Therefore, the Data Hub development, data, and QA teams will work with the CTOS data team to develop a comprehensive Standard MDF Model that encompasses the types of features that can be found in MDF files:
The Standard MDF Model will be used to test Data Hub as development proceeds in place of the data commons specific data models. These tests will include:
Use of the Standard MDF Model for testing does raise a few assumptions and risks that are known, and accepted:-