cancerDHC / tools

A repository for the work of the Tools workstream for CCDH
2 stars 1 forks source link

create example CRDC-H data in JSON/linkml #27

Closed balhoff closed 3 years ago

balhoff commented 3 years ago

For testing and demonstrating validation services, we need a collection of sample data adhering to the CRDC-H model.

gaurav commented 3 years ago

How about https://github.com/cancerDHC/example-data as a repository name?

gaurav commented 3 years ago

This repository now exists: https://github.com/cancerDHC/example-data. I've also added two suggestions for possible data we could pull into the examples for now, but hopefully other CCDH people will have better ideas.

gaurav commented 3 years ago

CDA described their ETL process at our meeting yesterday. They import data from GDC/PDC via their APIs in JSON format, and then transforms it into the CDA representation (which is based on the CRDC-H model). They are working to create a mapping YAML file that would be used to handle this transformation process, both in terms of restructuring the data as well as mapping values to concepts. We can benefit from this transformation by using data from the CDA rather than directly from the GDC/PDC portals. If we develop our own methods for extracting data from other nodes, we should make sure those transformations are included in those mapping files eventually.

(Incidentally, during my conversation with @jiaola today, he pointed out that the code he's written at https://github.com/HOT-Ecosystem/crdc-node-models could potentially be used to generate a mapping YAML file from the CRDC-H model description in Google Sheets directly, which might potentially be useful to the CDA going forward.)

balhoff commented 3 years ago

See https://github.com/cancerDHC/example-data.