ga4gh / fasp-scripts

Apache License 2.0
11 stars 7 forks source link

Review variations in dbGaP data_dicts #22

Open ianfore opened 3 years ago

ianfore commented 3 years ago

There are some variations in how the dbGaP data_dict.xml files define data sources.

Some work to address this includes validating every dbGaP data_dict for conformance to an XML schema. After five iterations of that schema 16,000 of approx 20,000 data-dictionaries validate. Simple changes to XML schema make significant increases in the number of schema that pass. This suggest this approach gives significant leverage on the problem. To do: put that code in GitHub.