cancerDHC / tools

A repository for the work of the Tools workstream for CCDH
2 stars 1 forks source link

data mapping service using linkml value sets #31

Open balhoff opened 3 years ago

balhoff commented 3 years ago

This will build on the validation mechanism defined in #29. Given some value set definitions within a linkml model, we should be able to map input data to likely data elements and values. This service could later be connected with a tool like Ptolemy to provide CRDC-H support.

balhoff commented 3 years ago

Given the late-breaking addition of enumerations to the model, implementation of this will need to be deferred to Phase 3.

gaurav commented 3 years ago

I think this breaks down into two tasks:

  1. Demonstrating that transformations can be set up using Python data classes automatically generated from the LinkML model: cancerDHC/example-data#8
  2. Figure out if we can automate that transformation-generation process, i.e. if the model could tell you how to transform a GDC:Sample.biospecimen_anatomic_site to a CCDH:BodySite.site, or how to transform data from v1.0.1 of the CCDH model to v2.0. We're planning to do this in two ways:
    1. The Data Model Harmonization team is looking into coming up with a format for recording this transformation information in the model itself.
    2. Currently, enumerated values in the CCDH model are taken directly from the node data dictionaries, so CCDH:BodySite.site uses the same enumerated values as the union of the values used by GDC:Sample.biospecimen_anatomic_site, PDC:Sample.biospecimen_anatomic_site as well as other node mappings. When we start mapping these values to concepts and removing duplicates from these lists, I think the Terminology team plans to produce SSSOM files of those mappings, and the Tools team would then build tools to map values using those SSSOM files.

Does that sound right? Or am I missing something?