Closed rosibaj closed 4 years ago
Thanks! @beckyjackson will make those changes, then add all the cohorts from https://github.com/jamesaoverton/IHCC/blob/master/data/member_cohorts.csv
I guess there's 67 rows, for 66 cohorts.
@beckyjackson one additional request to change the available_data_types
format (edited as No. 3 above)
@rosibaj - for the consistency, would something like this work for you, or do you want the lowest-level always to be a list?
"questionnaire/survey data": {
"lifestyle and behaviours": {
"alcohol": {},
"nutrition": {},
"sleep": {},
"tobacco": {}
}, ...
@beckyjackson No, what you have proposed above would not work, unless each of the "alcohol": {},
has additional elements in the ontologies that are assigned afterwards. So i think the best assumption here is:
Yes,
For many children of 'questionnaire/survey data', there are next-level children, like this example:
"questionnaire/survey data": {
"lifestyle and behaviours": [
"alcohol",
"tobacco"
]
}
But the problem comes if a cohort only looks at signs and symptoms from questionnaire/survey data, because this one doesn't have any children:
"questionnaire/survey data": [
"signs and symptoms"
]
How would you like the data to show up in the above case, where only 'signs and symptoms' shows up under 'questionnaire/survey data'?
Thanks for your help!
EDIT: one idea would be to put null
on any that don't have children, but that might cause the same problem as the empty dictionary.
@beckyjackson Its imperative that the JSON structure between cohorts be exactly the same. This means that we definitely cannot have "questionnaire/survey data"
as both an array and an object.
This is a harmonization issue with these options as solutions that I can see for now:
2) seems the better option to me. We can do a structure like this:
Cohort 1 (has children)
"questionnaire/survey data": {
"lifestyle_and_behaviours": [
"alcohol",
"tobacco"
],
"physiological_measurements": [
"height",
"weight"
],
"general_variables": null,
}
Cohort 2 (has no children) so populate the general variables
array with those values.
"questionnaire/survey data": {
"lifestyle_and_behaviours": null
"physiological_measurements": null
"general_variables": ["signs and symptoms"] <-- group all possible cases were there are no child
}
Cohort 3 ( also has no children)
"questionnaire/survey data": {
"lifestyle_and_behaviours": null
"physiological_measurements": null
"general_variables": ["any_potential_value"] <-- group all possible cases were there are no child
}
Note that the structure should remain the same in all cases. Its not needed to populate the null fields (as that can be taken care of automatically), but it is important that the basic structure in all cohort documents is identical.
If there are other cases of cohorts having non-child entities in other objects, this same methodology can be used.
Does this make sense?
I want to make sure I'm interpreting this correctly:
If a cohort looks at only signs and symptoms, they get:
"questionnaire/survey data": {
"general_variables": ["signs and symptoms"]
}
If a cohort looks at just medication (not worrying about the sub-categories, therefore this gets null
since it has children, just not applicable for this cohort) and signs and symptoms they get:
"questionnaire/survey data": {
"medication": null,
"general_variables": ["signs and symptoms"]
}
Finally, if a cohort were to look at posology from medication, they would get:
"questionnaire/survey data": {
"medication": ["posology"]
}
@beckyjackson i think that looks correct based on your description of cohorts!
Great - following this pattern, I regenerated the cohort data for what we currently have. If you have a chance, can you take a look at it and make sure it's what you expect? https://github.com/jamesaoverton/IHCC/blob/data-update/data/cohort-data.json
Thanks so much!
@beckyjackson I have reviewed - This structure looks great but there is still one small issue in the naming of fields:
Can we please remove spaces/special characters from fields names? For examples:
(spaces)-
and
@beckyjackson I have reviewed - This structure looks great but there is still one small issue in the naming of fields:
Can we please remove spaces/special characters from fields names? For examples:
(spaces)-
and
@beckyjackson I have reviewed - This structure looks great but there is still one small issue in the naming of fields:
Can we please remove spaces/special characters from fields names? For examples:
(spaces)-
and
@beckyjackson I have reviewed - This structure looks great but there is still one small issue in the naming of fields:
Can we please remove spaces/special characters from fields names? For examples:
(spaces)-
and
@beckyjackson sorry for the multiple comments! was a result of the github incident!
No worries! I couldn't even respond haha. I updated the file in the PR to remove the spaces and special characters.
@rosibaj We're hoping that https://github.com/jamesaoverton/IHCC/blob/master/data/full-cohort-data.json is exactly what you want. If you have any trouble with it, please reopen this issue.
@jamesaoverton It seems i dont have the permission to repoen this issue. There are 2 small things: Is it possible to have these fixed?
cirulation_and_respiration
is supposed to be circulation_and_respiration
? (misspelled circulation Hi @rosibaj - thanks for catching that typo, that's an easy fix to make!
As for your second point, I'm not sure I understand what you're asking.
In current structure, if a cohort looks at a broad category that has children, but the cohort doesn't care about the children, we give that category a null
to keep the structure the same. For example, if a cohort looks at weight
:
"questionnaire_survey_data": {
...
"anthropometry": ["weight"],
...
}
But if they just look at the broad category of "anthropometry", they get:
"questionnaire_survey_data": {
...
"anthropometry": null,
...
}
If we were to include all fields, I worry you'd end up with two separate cases that get null
values. I think you want all the top-level categories, correct? But if a cohort looks at just the broad category, it gets null
. Then, it would also get null
if it doesn't look at it at all.
Perhaps I'm not understanding your request correctly, so could you provide an example? Thank you so much!
@beckyjackson
I understand the distinction that you are making. In the long-run, i think a better way to do this is to actually assign a value (ex Not gathered
) to be explicit about this. However, thats a discussion for a later time!
For the sake of the demo, can we just have the data formatted like this for now:
"questionnaire_survey_data": {
...
"anthropometry": ["weight"],
...
}
But if they just look at the broad category of "anthropometry" rather than putting null, make it an empty array.
"questionnaire_survey_data": {
...
"anthropometry": [],
...
}
Sure, that's not a problem. Then, do you want the upper-level broad categories to all be displayed, and have null
if they are not collected?
@beckyjackson yes, but populated as empty data. Using biosample
as an example, i would expect to see this in the '
Genomics England / 100,000 Genomes Project"` cohort document:
"biosample": {
"sample_type": [ ]
},
I'm sorry, I'm still a bit confused. Going back to the antropometry data, say they don't collect any questionnaire/survey data, are you saying you would still want to display all the fields that appear in the CINECA structure, even though they don't have anything mapped to this:
"questionnaire_survey_data": {
...
"anthropometry": [],
...
}
How does this differ from if they do collect anthropometry data, but don't have something mapped to the children (currently displayed as "anthropometry": null
, but will be changed to "anthropometry": []
)?
@beckyjackson
In your examples, you are making the assumption that "anthropometry": [],
and "anthropometry": null
mean different things. In fact, in the way that you are translating this informatino they do mean different things.
In the faceted search display (based on Elasticsearch index) there is no functional difference between a null
and empty
field. Both would be an empty facet in the display (meaning the same thing: this data does not exist for this filter). Inconsistent documents are however harder to work with, which is why we prefer to at least have the empty data state.
I think that a possible solution to this is to include the category as a value of itself (see below) for examples:
"questionnaire_survey_data": {
...
"anthropometry": [No Data],
...
}
"questionnaire_survey_data": {
...
"anthropometry": [anthropometry],
...
}
"questionnaire_survey_data": {
...
"anthropometry": [weight, height],
...
}
However, this constitutes a larger data modelling question that is not in the scope of this demo and warrants more thought.
For the demo purposes, can we continue with the empty array display?
I changed the null
values to empty arrays and fixed the typo - please see the data file here.
@rosibaj We've updated https://github.com/jamesaoverton/IHCC/blob/master/data/full-cohort-data.json with those changes.
Based on the April 17th meeting, we will want to do one update of the data before the next demo.
These updates will include:
From the JSON provided, we had to fix a couple things: