ChildMindInstitute / mhdb-tables2turtles

Text processing code to convert specific spreadsheets to RDF as initial content for the Mental Health Database (MHDB)
Other
0 stars 0 forks source link

restructure references to cover structures, classes and labels #55

Closed shnizzedy closed 6 years ago

shnizzedy commented 6 years ago

In revised structure for neutral states & lead questions, I'm confused about what the "references" column relates to on that spreadsheet.

The way that CMI InnovativeTech Lab split and CMI InnovativeTech Lab split based on age group are phrased makes me think the references refer to the structure of the signOrSymptom hierarchy, rather than the signsOrSymptoms themselves.

I also think it would be pretty cynical and disingenuous of us to say, for example, that the DSM is the source of the sign "Often fails to give close attention to details or makes careless mistakes in schoolwork, at work, or during other activities" but that we are the source of the signs "Often fails to give close attention to details" and "Often makes careless mistakes in schoolwork, at work, or during other activities"; on the other hand, I think it makes a lot of sense to say that we are the source of the assertion that "Often fails to give close attention to details" and "Often makes careless mistakes in schoolwork, at work, or during other activities" are subClass[es]Of "Often fails to give close attention to details or makes careless mistakes in schoolwork, at work, or during other activities".

The query

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX health-lifesci: <http://schema.org/>
PREFIX mhdb: <http://www.purl.org/mentalhealth#>
PREFIX : <http://www.purl.org/mentalhealth#>
PREFIX mhdbnb: <http://www.purl.org/mentalhealth/neutralstates#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX schema: <http://schema.org/>

SELECT ?source ?subject ?predicate ?object
WHERE { 
  ?statement dcterms:source ?z_node .
  ?z_node rdfs:label ?source .
  ?statement rdf:subject ?x_node .
  ?x_node rdfs:label ?subject .
  ?statement rdf:object ?o_node .
  ?o_node rdfs:label ?object .
  ?statement rdf:predicate ?predicate .
} 
GROUP BY ?subject ?source ?predicate ?object 
ORDER BY ?source ?object ?subject ?predicate 

run on our workbench will show the reference as the source of triples.

Right now, the sources of statements that are included are:

What we don't have is any indication of structure of any of the signsOrSymptoms attributed to us in revised structure for neutral states & lead questions.

My questions:

  1. Does all this seem reasonable and accurate?
  2. Do we care to encode the DSM as the source of the Disorder structure as well?
  3. Do we want to encode sources for labels?
  4. We need to encode where our split signsOrSymptoms were split from, right?
  5. Can we have our lab be a single line-item in the references and encode the rest in the graph structure? (I don't think "CMI InnovativeTech Lab split" or "CMI InnovativeTech Lab split based on age group" make sense as references. I think "Child Mind Institute MATTER Lab" should be the source of a split in the graph; "age group" could be a reason for a split in the graph (if we even care about encoding that))
binarybottle commented 6 years ago

Yes to 1-5.

shnizzedy commented 6 years ago

Notes:

  1. The references are encoded in triplicate with the splits encoded in duplicate, so I'm going to trash the code pulling references from revised structure for neutral states & lead questions and rely on the structure given in mentalhealth and rename this issue and update its tags and assignment accordingly.
  2. The references "split" not "split by age group" are all split by conjunctions per @anirudh4792
binarybottle commented 6 years ago

Anything remaining to do for this issue?

shnizzedy commented 6 years ago

This is being resolved in #53 and #54.