PennTURBO / semantic-engine

TURBO semantic engine (Drivetrain). Transforms source-dependent RDF data into a source-independent, semantically rich RDF model.
6 stars 0 forks source link

Conditional hasRequiredInput (vs partial instantiation)? #60

Closed turbomam closed 4 years ago

turbomam commented 5 years ago

I figured out a workaround for this problem, so it's a lower priority now.


There are 1152 rows in the Synthea OMOP person table. All 1152 become Stardog direct mapping "shorcuts"

But only 1001 Homo sapiens are being instnatiated in the expanded graph. I made all of the inputs beside the key symbol (person_id) optional.

drivetrain:Patients
  a ontologies:TURBO_0010354 ;
  drivetrain:hasOptionalInput drivetrain:PatientCridsymShortcut ;
  drivetrain:hasOptionalInput drivetrain:PatientDobShortcut ;
  drivetrain:hasOptionalInput drivetrain:PatientGidShortcut ;
  drivetrain:hasOptionalInput drivetrain:PatientRidShortcut .

But I have let the 'has OMOP concept ID' and subClassOf as RequiredInputs. If GID or RID values are present in the relational data, I want to enforce the concept mapping and the subClassOf restrictions.

select
    p.gender_concept_id,
    p.race_concept_id,
    count(1)
from
    cdm_synthea10.person p
group by
    p.gender_concept_id,
    p.race_concept_id
order by
    count(1) desc

I haven't checked DOBs, but there are 151 rows with racial concept ID values of 0 (the Hispanic people in Synthea).

gender_concept_id race_concept_id count
8532 8527 251
8507 8516 246
8507 8527 228
8532 8516 226
8532 0 80
8507 0 71
8532 8515 27
8507 8515 23
turbomam commented 5 years ago

Mapping OMOP racial concept 0 to plain old RID obo:OMRSE_00000098 does bring the person count back up to 1152.

greenguy33 commented 5 years ago

I think it would be best to get in the habit of posting the Sparql query that is generated by your process specification for these kind of issues. That is the easiest way to start to debug such a problem

turbomam commented 5 years ago

Here it is from run PrintQuery, just turned into a SELECT.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX turbo: <http://transformunify.org/ontologies/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
select *
WHERE {
    GRAPH <https://raw.githubusercontent.com/PennTURBO/Turbo-Ontology/master/ontologies/turbo_merged.owl> {
        ?GidClassList <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://purl.obolibrary.org/obo/OMRSE_00000133> .
        ?RidClassList <http://www.w3.org/2000/01/rdf-schema#subClassOf>* <http://purl.obolibrary.org/obo/OMRSE_00000098> .
        ?GidClassList <http://transformunify.org/ontologies/TURBO_0010147> ?gender_LiteralValue .
        ?RidClassList <http://transformunify.org/ontologies/TURBO_0010147> ?race_LiteralValue .
    }
    GRAPH <https://github.com/PennTURBO/Drivetrain/omopShortcuts> {
        ?person <http://api.stardog.com/person#person_id> ?person_keysym_LiteralValue .
        ?person rdf:type <http://api.stardog.com/person> .
        OPTIONAL {
            ?person <http://api.stardog.com/person#person_source_value> ?person_cridsym_LiteralValue .
        }
        OPTIONAL {
            ?person <http://api.stardog.com/person#birth_datetime> ?birth_datetime_DateLiteralValue .
        }
        OPTIONAL {
            ?person <http://api.stardog.com/person#gender_concept_id> ?gender_LiteralValue .
        }
        OPTIONAL {
            ?person <http://api.stardog.com/person#race_concept_id> ?race_LiteralValue .
        }
    }
}
greenguy33 commented 5 years ago

At first glance, this seems like a good case to use the drivetrain:OptionalGroup feature, to group your searches for gender_LiteralValue/GidClassList and race_LiteralValue/RidClassList into a single optional block.

turbomam commented 5 years ago

I think this would satisfy my requirement: if there is a gender concept ID that matches some class on ontologies:TURBO_0010147, then that class must be a subClassOf* 'gender identity datum'

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX turbo: <http://transformunify.org/ontologies/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
select *
WHERE {
    GRAPH <https://github.com/PennTURBO/Drivetrain/omopShortcuts> {
        ?person <http://api.stardog.com/person#person_id> ?person_keysym_LiteralValue .
        ?person rdf:type <http://api.stardog.com/person> .
        OPTIONAL {
            ?person <http://api.stardog.com/person#person_source_value> ?person_cridsym_LiteralValue .
        }
        OPTIONAL {
            ?person <http://api.stardog.com/person#birth_datetime> ?birth_datetime_DateLiteralValue .
        }
    }
    optional {
        GRAPH <https://raw.githubusercontent.com/PennTURBO/Turbo-Ontology/master/ontologies/turbo_merged.owl> {
            ?GidClassList <http://transformunify.org/ontologies/TURBO_0010147> ?gender_LiteralValue .
            ?GidClassList <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://purl.obolibrary.org/obo/OMRSE_00000133> .
        }
        GRAPH <https://github.com/PennTURBO/Drivetrain/omopShortcuts> {
            ?person <http://api.stardog.com/person#gender_concept_id> ?gender_LiteralValue .
        }
    }
    optional {
        GRAPH <https://raw.githubusercontent.com/PennTURBO/Turbo-Ontology/master/ontologies/turbo_merged.owl> {
            ?RidClassList <http://www.w3.org/2000/01/rdf-schema#subClassOf>* <http://purl.obolibrary.org/obo/OMRSE_00000098> .
            ?RidClassList <http://transformunify.org/ontologies/TURBO_0010147> ?race_LiteralValue .
        } 
        GRAPH <https://github.com/PennTURBO/Drivetrain/omopShortcuts> {
            ?person <http://api.stardog.com/person#race_concept_id> ?race_LiteralValue .
        }
    }
}
greenguy33 commented 5 years ago

See issue #64