Closed pappewaio closed 3 years ago
Wow, this is great feedback :1st_place_medal:
During the datacore meeting we also spoke about adding titles for each fields, instead of using their unique key.
Some thoughts:
How much flexibility do we have in the design of the page? I'm wondering if there is a way to make it more compact, where we show minimal information unless it is requested - hidden under an info button, for example. See this:
I am fine with titles instead of variable names. Some suggestions:
cleansumstats_metafile_user // User cleansumstats_metafile_date // Date -> could be auto generated when clicking the download? path_sumStats // GWAS sumstats: path_readMe // Sumstats documentation:
study_PMID // Pubmed ID for the publication associated with sumstats: study_Title // Publication title: study_Year // Publication year: path_pdf // Publication PDF: path_supplementary // Publication supplementary information:
study_PhenoDesc // Free description of the trait associated with the GWAS sumstats study_PhenoCode // Standardized phenotype code
study_FilePortal // URL for GWAS sumstats repository study_FileURL // URL for direct download study_AccessDate // Date of download study_Use // Are the GWAS sumstats publicly shared?
study_includedCohorts // Contributing GWAS cohorts: study_Ancestry // Ancestry of GWAS cohorts: study_Gender // Gender of GWAS cohorts: study_PhasePanel // Phasing reference panel: study_PhaseSoftware // Phasing software: study_ImputePanel // Imputation reference panel: study_ImputeSoftware // Imputation software: study_Array // Genotyping array: study_Notes // Special considerations and notes:
stats_TraitType // GWAS trait type: stats_Model // GWAS statistical model: stats_TotalN // Total sample size stats_CaseN // Case sample size stats_ControlN // Control sample size stats_GCMethod // Approach to genomic inflation correction (GC): stats_GCValue // GC adjustment factor stats_Notes // Special considerations and notes:
col_CHR // Chromosome: col_POS // Position: col_SNP // SNP Identifier: col_EffectAllele // Statistical effect reference allele (EA): col_OtherAllele // Other allele (OA): col_BETA // Per allele effect (i.e., regression coefficient, beta): col_SE // Standard error of beta: col_OR // Odds ratio (OR): col_ORL95 // Upper 95% confidence bound of OR: col_ORU95 // Lower 95% confidence bound of OR: col_Z // Hypothesis test statistic (e.g., Z, Wald, t): col_P // P-value col_N // Per SNP total sample size col_CaseN // Per SNP case sample size col_ControlN // Per SNP control sample size col_INFO // Imputation quality score col_EAF // EA frequency in the GWAS cohorts col_OAF // OA frequency in the GWAS cohorts colDirection // Cohort effect directions (e.g., +++) col_Notes // Special considerations and notes:
General Notes:
cleansumstats_version The version should be filled in by the pipeline, automatically. It's more about tracing what was done, rather than annotating the raw sumstats file.
study_Use Perhaps if the answer is no to the above question, then a prompt appears to fill in restrictions and an "owner" or "contact"?
study/stats_notes Do we need to notes sections? Could we concatenate answers from other sections?
col_AFREQ Depricated by EAF and OAF - we can remove
How much flexibility do we have in the design of the page? I'm wondering if there is a way to make it more compact, where we show minimal information unless it is requested - hidden under an info button, for example. See this:
100%, because I ended up writing my own form implementation :smile:
The existing implementations were very entangled in different javascript-frameworks, so using them would make it really hard to do custom forms and fields. So I decided to write my own implementation since it was trivial.
Having the form opens up a new world when it comes to notes like "study/stats_notes". @rzetterberg would it be possible to add a "default note" button to many fields and collect notes from different sections? Then we could remove the specific study/stats notes
would it be possible to add a "default note" button to many fields and collect notes from different sections?
Yes, no problem! All field widgets can access the data structure that contains all field values. So you can easily pick values from other fields for auto-fill.
Right now, the fields stats_ControlN
, stats_CaseN
and stats_TotalN
have the type number
. This means that you can enter real numbers and that will be valid.
But they should be natural numbers, right? You can't have a study where you used 1000.5
cases, right?
yes, decimals dont make sense for the "N" variables.
study_PMID
should not be a required fieldSorry, I'm a bit late to this.
There is important information here that we do need to "require" somehow - How do we cite this data when we use it?
PMID, ensures we can cite the data when used
if no PMID
doi preprint link does the same
if no preprint link
we probably need a contact or data owner. Perhaps one solution is that if there is no PMID or preprint link, the data can not be listed as public, but is set to restricted and a contact (name, email) is required.
citing data is critical! :-)
What we can do is setting the field to required again and adding a third option which represents studies that does not have a pubmed iD or a DOI link:
This third option could be any type of reference, but by doing this it forces the user to make a conscious decision on what reference to use. This will hopefully avoid situations where people do have a pubmed or DOI link, but they miss filling it in, when the field was not required.
we probably need a contact or data owner. Perhaps one solution is that if there is no PMID or preprint link, the data can not be listed as public, but is set to restricted and a contact (name, email) is required.
The third option I suggested above could also be used to fill that in, if we'd like.
Sounds good to have it like that. Then people will feel forced to enter the correct ID
I've updated the metadata form so that you can create a metadata file either from scratch or from an existing metadata file.
I have fixed most of the feedback from this issue, can you please have a look at the current version of the form and tell me if you are content with the changes or if we should change something else? @pappewaio @AndrewSchork
Here's the URL: https://biopsyk.github.io/metadata/
You should see something like this:
If you don't you need to clear the browser cache so that the latest files are downloaded.
Amazing! A few comments/suggestions:
Publication ID - can the pubmed ID be the default (first) tab?
Are the GWAS sumstats publicly shared?*
If restricted is selected, we need a place to report an owner/contact email
Genotyping Array* (not required)
I would like to edit/modify some of the ontologies - names in the drop down menus, descriptions, etc.
Is there a place I can do that? In the coding formats? Or is it best to put them in an intermediate place for you to update?
I have a list of adjustment:
-This is a bit confusing, so just give an example e.g., Gadin_bioinf2018.pdf
-Give an example: "A genome-wide association study of shared risk across psychiatric disorders implicates gene regulation during fetal neurodevelopment"
-Give example
-This is a small number of options? Don't we have more in our inventory already?
-Why is this not a drop-down menu but something that needs to included extra?
Please see my answers below to your questions. All requests without questions have been added as tasks at the end of this comment.
Andrew: Publication ID - can the pubmed ID be the default (first) tab?
Yes, the tabs are rendered in the order that the options appear in the schema, so it's just a matter of moving the "pubmid ID" option first.
Andrew: If restricted is selected, we need a place to report an owner/contact email
Joeri: 6. Building on what Andrew said. After somebody said "restricted data" this can only be finalised when the study controller is provided as well. Else you risk having datasets that are reported restricted and nobody knows why when people move out of the IBP.
We could show additional fields when this option is selected. What additional fields would you like? Two new required fields, one for name and one for email?
Andrew: I would like to edit/modify some of the ontologies - names in the drop down menus, descriptions, etc. Is there a place I can do that? In the coding formats? Or is it best to put them in an intermediate place for you to update?
Yes, we can setup your local computer with a copy of the web form and the schema, then you can edit the schema locally and preview the web form after each change. Then when you are done you can commit the changed schema file to GitHub.
I'll reach out to you on slack later today when I have prepared the web form for local use.
Joeri: 4. Genotyping Array This is a small number of options? Don't we have more in our inventory already?
I took these options from the Google Doc that was provided in the original metadata template file (https://docs.google.com/spreadsheets/d/1qghudJelGssaTbe8CDAOHOk7fhpyDAwEKGkOBMqGb3M/edit#gid=321787909)
Basically, all dropdown options you see in the form was taken from that document.
Joeri: 5. Contributing GWAS cohorts Why is this not a drop-down menu but something that needs to included extra?
The way this field is supposed to work is that the user should be able to select one or many of these values:
- "none"
- "iPSYCH2012"
- "iPSYCH2015"
- "UKB"
- "GEMS"
But I haven't finished implementing drop downs with multiple selections yet, so it's using the standard way of showing array fields, which is a "New" button and a list of values.
study_PMID
fielduniqueItems: true
to field study_includedCohorts
so that it can be used as a drop down with multiple selection
This will revolutionise the whole metafile concept, and I am excited about the quality we will be able to offer. I took a look at the most recent version that @rzetterberg sent on slack and tried to fill it in and see if I could come up with feedback, and here it is:
[x] 1. cleansumstats_version Maybe we should only accept the actual versions that exist instead of versions that possibly can exist?
[x] 2. cleansumstats_metafile_user Add info in the description that it is appreciated with first and last name.
[x] 3. cleansumstats_metafile_date We could perhaps limit this to not allow dates that have already passed? Or even better, maybe let the machine detect the date and use that when the user hits "create metadata", which means that this field will disappear.
[x] 4 path_sumStats this should as we have discussed previously be changed to filename instead of path (also explicitly write in the description to not put the full path). We could write the unix command to gzip a file to help novice users. Right now it seems to allow filenames that don't have .gz.
[x] 5. path_readMe Like for the sumstats, we should remove the absolute path option
[x] 6. path_pdf Like for the sumstats, we should remove the absolute path option
[x] 7. path_supplementary Like for the sumstats, we should remove the absolute path option. Add in the description that multiple supplementary files are allowed (even though it seems obvious from the +New button), and that they can be in any format.
[x] 8. study_Title Add that, it is also allowed with pre-print titles (see study_PMID for a text we could recycle).
[x] 9. study_PMID I can see that there are two tabs, but in my browser there is no text on them and nothing really happens when I switch. I actually don't understand this _If missing, record extra data in studyUse. We should also give an exampel of what uri format is.
[x] 10. study_Year Seems totally ok. Maybe add a limit to not put in a year that is ahead in time.
[x] 11. study_PhenoDesc We could perhaps drop this sentence? Consider checking external inventories for the PMID to see if this has already been coded and you agree with the description, augmenting additional info where necessary.
[x] 12. study_PhenoCode Is it possible to use three drop down lists like gwas ATLAS have, one for domain, chapter and subchapter. That could really help finding the right phenoCode quickly.
[x] 13. study_FilePortal Remove this text? I don't undertand what it means Maybe in GWAS atlas reference. Maybe give example of what uri format is. I could not write "www.externalpage.com"? should we really require the http:// part? Or is it possible to autofill somehow?
[x] 14. study_FileURL same here, maybe remove Maybe in GWAS atlas reference.
[x] 15. study_AccessDate Remove in description that it needs to be in ISO-8601 format as it will be autoconverted now. Could also set a limit to not include future dates.
[x] 16. study_Use change from please provide a details description to please provide a detailed description
[x] 17. study_includedCohorts Need multiple choice or that we can add more times than one from the drop down list. I don't understand this text, or why we have the links just beneath. Consider checking PMID in external inventories. List of studies to watch out for is provided in the ontology doc.
[x] 18. study_Ancestry Even if don't support more ancestries than these five right now, maybe we could one list for superpopulations and one for subpopulations, and that you can several from each if you want...
[ ] 19. study_PhasePanel Maybe we should have version number also. The most commonly used today is phase3 version5 right? And are actually phase2 accessible, and used? I might have to figure that out myself..
[ ] 20. study_PhaseSoftware We could have two lists, one for the tool name and one for its version (could use same two list system for study_PhasePanel)
[ ] 21. study_ImputePanel We could have two lists, one for the tool name and one for its version (could use same two list system for study_PhasePanel)
[ ] 22. study_ImputeSoftware We could have two lists, one for the tool name and one for its version (could use same two list system for study_PhasePanel) Also
shapeit
cannot do imputation :)[ ] 23. study_Array We might need more options for the array type. Although, there are perhaps more array types than what is worth putting into this list.. hmmm..
[x] 24. study_Notes Add "if for example a dataset has restriction, add important restriction info here"
Ok I stop here for now 📟