ChildMindInstitute / mhdb-tables2turtles

Text processing code to convert specific spreadsheets to RDF as initial content for the Mental Health Database (MHDB)
Other
0 stars 0 forks source link

Reencode #12

Closed shnizzedy closed 6 years ago

shnizzedy commented 7 years ago

There's a lot of redundancy in our tables.

anirudh4792 commented 6 years ago

This should be taken care of once we have our 'non-redundant unique' list of questions

shnizzedy commented 6 years ago

When I say redundancy here, I'm talking about multiple indices for identical entities. For example, in the SignOrSymptomAllInOne sheet, each of these signs or symptoms is indexed multiple times verbatim:

"Abrupt cessation of or reduction in caffeine use, followed within 24 hours by flu-like symptoms" : 449, 859
"Amphetamine (or other stimulant) related symptoms" : 1050, 1012
"An eating or feeding disturbance as manifested by persistent failure to meet appropriate nutritional
and/or energy needs" : 786, 288
"At some point during the course of the disorder, the individual has performed mental acts in response to the appearance concerns" : 713, 215
"At some point during the course of the disorder, the individual has performed repetitive behaviors in response to the appearance concerns" : 214, 712
"Clinically significant problematic behavioral or psychological changes that developed during, or shortly after, alcohol ingestion" : 856, 411
"Deficits in adaptive functioning" : 1106, 1108
"Deficits in intellectual functions" : 1105, 1107
"Depersonalization: Experiences of unreality, detachment, or being an outside observer with respect to one's thoughts, feelings, sensations, body, or actions" : 776, 269
"Derealization: Experiences of unreality or detachment with respect to surroundings" : 270, 777
"Directly experiencing the traumatic event(s)" : 729, 764
"Dissociative reactions in which the individual feels or acts as if the traumatic event(s) were recurring. (Such reactions may occur on a continuum, with the most extreme expression being a complete loss of awareness of present surroundings" : 733, 226
"Dissociative symptoms: An altered sense of the reality of one's surroundings or oneself" : 250, 770
"Exaggerated startle response" : 761, 746
"Fascination with, interest in, curiosity about, or attraction to fire and its situational contexts" : 374, 852
"Flashbacks" : 925, 916
"Hallucinations occur with intact reality testing " : 857, 430
"Hypervigilance" : 745, 760
"Illusions occur in the absence of a delirium" : 858, 431
"In children 6 years and younger, dissociative reactions in which the child feels or acts as if the 
traumatic event(s) were recurring. (Such reactions may occur on a continuum, with 
the most extreme expression being a complete loss of awareness of present sur
roundings.) Such trauma-specific reenactment may occur in play" : 752, 235
"Language abilities are substantially and quantifiably below those expected for age" : 658, 159
"Marked fear or anxiety about a specific object or situation" : 202, 703
"Negative mood: Persistent inability to experience positive emotions" : 769, 249
"No or little dream imagery is recalled" : 811, 346
"Nocturnal breathing disturbances" : 805, 313
"Often loses things necessary for tasks or activities" : 14, 511
"Problems with concentration" : 762, 747
"Rearing in unusual settings that severely limit opportunities to form selective attachments" : 728, 721
"Recurrent alcohol use resulting in a failure to fulfill major role obligations" : 388, 854
"Recurrent inappropriate compensatory behaviors in order to prevent weight gain" : 297, 792
"Recurrent purging behavior to influence weight or shape in the absence
 of binge eating" : 304, 793
"Repeated changes of primary caregivers that limit opportunities to form stable attachments" : 727, 720
"Repeated passage of feces into inappropriate places whether involuntary or intentional" : 306, 794
"Repetitive, seemingly driven, and apparently purposeless motor behavior" : 169, 688
"Sedative, hypnotic, or anxiolytic related symptoms" : 1011, 1049, 1061
"Sleep disturbance" : 748, 772, 763
"Social phobia based on animals" : 204, 704
"Social phobia based on natural environment" : 205, 705
"Social phobia based on other things" : 211, 708
"Social phobia for blood injection injury, fear of blood" : 706, 206
"Social phobia that is situational" : 210, 707
"The child has experienced a pattern of extremes of insufficient care" : 222, 220
"The typical sleep of individuals with nightmares is mildly impaired" : 812, 350
"Withdrawal, as manifested by transient tactile hallucinations developing within several hours to a few days after the cessation of (or reduction in) alcohol use" : 403, 424
"Witnessing, in person, the event(s) as it occurred to others" : 730, 765
"altered voluntary motor or sensory function: With abnormal movement" : 277, 783
"altered voluntary motor or sensory function: With special sensory symptom" : 282, 785
"altered voluntary motor or sensory function: With speech symptom" : 784, 279
"difficulty falling or staying asleep or restless sleep" : 920, 923
"eating or drinking" : 907, 869
"frequent changes in foster care" : 913, 911
"giving a speech" : 908, 870
"having a conversation, meeting unfamiliar people" : 906, 868
"institutions with high child-to-caregiver ratios" : 914, 912
"nausea, vomiting, or muscle pain/stiffness" : 1110, 965
"urinary symptoms" : 993, 995
anirudh4792 commented 6 years ago

Will give my suggestions to removing these redundancies after I have explained what they are/another round of comments

Types of redundancies:

1) I am for removing symptoms for disorders related to substance abuse and physical abuse and neglect (at least 20 varieties of each). For the first phase, I would like to focus on 'neurodevelopmental disorders' and 'anxiety disorders'.

2) take this example - higher level symptom: keeps track of things, lower level symptom: loses things such as books, pens, spectacles etc. What we decided to do was segregate the examples (some examples were culturally relevant). However, books, pens and spectacles could not exist on their own as it would make no sense. So I created a new workbook called examples and placed loses things in one column and examples in the next column. When we merged all in one, it caused redundancies but the examples were mapped to the symptom they were representing.

3) Urinary symptoms - I remember these - manual errors while converting 'diagnostic specifiers' into symptoms by adding 'symptoms of as a prefix or suffix' to the diagnostic specifier but some already had the word symptom.

4) eg "altered voluntary motor or sensory function: With special sensory symptom" : 282, 785 Sometimes, we had similar symptoms for disorders of 'different levels' that differed by a diagnostic specifier or inclusion/exclusion criteria. More the reason to do away with these labels. In our case, we have converted these into individual symptoms.

Additional comment: 1) When a question is repeated (usually physical symptoms like headaches not due to 'natural causes') and is related to different disorders, we see this a lot in questions taken from cmi questionnaire and I am curious to see how physical symptoms will play out with mental ones (combination of one causes the other and vice versa)

shnizzedy commented 6 years ago

I don't think we're talking about the same thing. This issue is about the same exact entity having more than one index (which is only a problem at the spreadsheet level, not a problem with the data or the turtle). If and when we quit using the spreadsheets, this issue will be moot in the American sense of the word.


Re:

  1. eg "altered voluntary motor or sensory function: With special sensory symptom" : 282, 785 Sometimes, we had similar symptoms for disorders of 'different levels' that differed by a diagnostic specifier or inclusion/exclusion criteria. More the reason to do away with these labels. In our case, we have converted these into individual symptoms.

In each of the examples I posted here the format is [text of symptom] : [comma separated indices of that symptom on the SignOrSymptomAllInOne worksheet] where the listed indices correspond to the exact same text. Look up index 282. Look up index 785. If those ("altered voluntary motor or sensory function: With special sensory symptom" and "altered voluntary motor or sensory function: With special sensory symptom") are supposed to be different symptoms from each other, I need to start over with how I think of this entire project.


For the types of redundancies I think you're talking about (I might call them overlaps), I suggest posting a seperate issue or milestone (or more than one) for each one that you think is a problem needing solving.

shnizzedy commented 6 years ago
def write_turtle(subject, predicates):
    """
    Function to write one or more rdf statements in terse triple format.

    Parameters
    ------------
    subject: string
        subject of all triples in these statements

    predicates: iterable of 2-tuples
        predicate: string
            nth property

        object: string
            nth object
    """
    return("{0} {1} .".format(
        subject,
        " ;\n\t".join(
            [" ".join(predicate) for predicate in predicates]
        )
    )
shnizzedy commented 6 years ago

being resolved in #53 and #54