CBIIT / bento-mdf

Bento's graph model definition framework
Other
3 stars 1 forks source link

Provide a Standard MDF data model to Data Hub #37

Closed kailingchen closed 1 month ago

kailingchen commented 2 months ago

Data Hub derives validation information such as CDEs, permissible values, and required properties, from data models that are described using the Model Description Format (MDF) that is used to describe data models for Bento installations. Data Hub also uses MDF to produce the data commons specific loading files that are produced as the end product of data submission

Once a submitter has started a submission with a specific version of a data model, they are allowed to continue using that version to complete their submission, regardless of any updates from the downstream data commons. This means that Data Hub will be supporting multiple versions of data models from multiple data commons. The QA burden for testing all these models and model versions is not sustainable.

Therefore, the Data Hub development, data, and QA teams will work with the CTOS data team to develop a comprehensive Standard MDF Model that encompasses the types of features that can be found in MDF files:

The Standard MDF Model will be used to test Data Hub as development proceeds in place of the data commons specific data models. These tests will include:

Use of the Standard MDF Model for testing does raise a few assumptions and risks that are known, and accepted:-

nelsonwmoore commented 1 month ago

Draft of standard MDF data model instance with these features added here.

wfy1997 commented 1 month ago

Only diagnosis node has an ID field. I think other nodes like study, participant, sample, and file should have their ID fields like diagnosis node.

nelsonwmoore commented 1 month ago

We do have id under the mustHave key in UniversalNodeProperties, which indicates that each node will have a property matching the id defined in PropDefinitions. Is there also a need for a {node_handle}_id field for each node? Or do you just need a property with Key: true for each node? I could add that to id

wfy1997 commented 1 month ago

I think we just need a property with Key: true for each node

nelsonwmoore commented 1 month ago

Standard model updated and on main branch here: https://github.com/CBIIT/bento-mdf/blob/main/drivers/python/tests/samples/crdc_datahub_mdf.yml