Closed AndrewSchork closed 3 years ago
https://docs.google.com/spreadsheets/d/1JowLmxixDu7oDYDtG984UZNU8HOxlFOZJq8bHD8jJ4E/edit?usp=sharing
review and make suggestions
When I implemented #112 I added the available values from the document your provided, so we already have support for checking that study_inHouseData
only can contain the allowed values.
Here's what the part of the schema looks like for this field:
study_inHouseData:
description: |
If iPSYCH data, UKBiobank, or some other in house data set that we analyze,
is in this study then this is very important to mark.
Consider checking PMID in external inventories.
List of studies to watch out for is provided in the ontology doc.
- Ontology: https://docs.google.com/spreadsheets/d/1qghudJelGssaTbe8CDAOHOk7fhpyDAwEKGkOBMqGb3M/
- External inventories: https://docs.google.com/spreadsheets/d/1NtSyTscFL6lI5gQ_00bm0reoT6yS2tDB3SHhgM7WwSE/
type: "string"
enum:
- "none"
- "iPSYCH2012"
- "iPSYCH2015"
- "UKB"
- "GEMS"
The list of values in the enum
property is the values that are allowed. Also note that instead of using the value "missing", you just don't add the study_inHouseData
field to the metadata-file.
I think the enum
looks good, but I think we should rename it from study_inHouseData to study_includedCohorts
. If we release the pipeline, then in-house data might not be appropriate. @AndrewSchork, what do you think?
Seems logical. go for it
Alright, I'll make it happen.
PR ready for review: #133
construct an ontology for study_inHouseData