Change metadata schema according to web form feedback

pappewaio commented 3 years ago

This will revolutionise the whole metafile concept, and I am excited about the quality we will be able to offer. I took a look at the most recent version that @rzetterberg sent on slack and tried to fill it in and see if I could come up with feedback, and here it is:

[x] 1. cleansumstats_version Maybe we should only accept the actual versions that exist instead of versions that possibly can exist?
[x] 2. cleansumstats_metafile_user Add info in the description that it is appreciated with first and last name.
[x] 3. cleansumstats_metafile_date We could perhaps limit this to not allow dates that have already passed? Or even better, maybe let the machine detect the date and use that when the user hits "create metadata", which means that this field will disappear.
[x] 4 path_sumStats this should as we have discussed previously be changed to filename instead of path (also explicitly write in the description to not put the full path). We could write the unix command to gzip a file to help novice users. Right now it seems to allow filenames that don't have .gz.
[x] 5. path_readMe Like for the sumstats, we should remove the absolute path option
[x] 6. path_pdf Like for the sumstats, we should remove the absolute path option
[x] 7. path_supplementary Like for the sumstats, we should remove the absolute path option. Add in the description that multiple supplementary files are allowed (even though it seems obvious from the +New button), and that they can be in any format.
[x] 8. study_Title Add that, it is also allowed with pre-print titles (see study_PMID for a text we could recycle).
[x] 9. study_PMID I can see that there are two tabs, but in my browser there is no text on them and nothing really happens when I switch. I actually don't understand this _If missing, record extra data in studyUse. We should also give an exampel of what uri format is.
[x] 10. study_Year Seems totally ok. Maybe add a limit to not put in a year that is ahead in time.
[x] 11. study_PhenoDesc We could perhaps drop this sentence? Consider checking external inventories for the PMID to see if this has already been coded and you agree with the description, augmenting additional info where necessary.
[x] 12. study_PhenoCode Is it possible to use three drop down lists like gwas ATLAS have, one for domain, chapter and subchapter. That could really help finding the right phenoCode quickly.
[x] 13. study_FilePortal Remove this text? I don't undertand what it means Maybe in GWAS atlas reference. Maybe give example of what uri format is. I could not write "www.externalpage.com"? should we really require the http:// part? Or is it possible to autofill somehow?
[x] 14. study_FileURL same here, maybe remove Maybe in GWAS atlas reference.
[x] 15. study_AccessDate Remove in description that it needs to be in ISO-8601 format as it will be autoconverted now. Could also set a limit to not include future dates.
[x] 16. study_Use change from please provide a details description to please provide a detailed description
[x] 17. study_includedCohorts Need multiple choice or that we can add more times than one from the drop down list. I don't understand this text, or why we have the links just beneath. Consider checking PMID in external inventories. List of studies to watch out for is provided in the ontology doc.
[x] 18. study_Ancestry Even if don't support more ancestries than these five right now, maybe we could one list for superpopulations and one for subpopulations, and that you can several from each if you want...
[ ] 19. study_PhasePanel Maybe we should have version number also. The most commonly used today is phase3 version5 right? And are actually phase2 accessible, and used? I might have to figure that out myself..
[ ] 20. study_PhaseSoftware We could have two lists, one for the tool name and one for its version (could use same two list system for study_PhasePanel)
[ ] 21. study_ImputePanel We could have two lists, one for the tool name and one for its version (could use same two list system for study_PhasePanel)
[ ] 22. study_ImputeSoftware We could have two lists, one for the tool name and one for its version (could use same two list system for study_PhasePanel) Also shapeit cannot do imputation :)
[ ] 23. study_Array We might need more options for the array type. Although, there are perhaps more array types than what is worth putting into this list.. hmmm..
[x] 24. study_Notes Add "if for example a dataset has restriction, add important restriction info here"

Ok I stop here for now 📟

rzetterberg commented 3 years ago

Wow, this is great feedback :1st_place_medal:

rzetterberg commented 3 years ago

During the datacore meeting we also spoke about adding titles for each fields, instead of using their unique key.

AndrewSchork commented 3 years ago

Some thoughts:

How much flexibility do we have in the design of the page? I'm wondering if there is a way to make it more compact, where we show minimal information unless it is requested - hidden under an info button, for example. See this:

I am fine with titles instead of variable names. Some suggestions:

cleansumstats_metafile_user // User cleansumstats_metafile_date // Date -> could be auto generated when clicking the download? path_sumStats // GWAS sumstats: path_readMe // Sumstats documentation:

study_PMID // Pubmed ID for the publication associated with sumstats: study_Title // Publication title: study_Year // Publication year: path_pdf // Publication PDF: path_supplementary // Publication supplementary information:

study_PhenoDesc // Free description of the trait associated with the GWAS sumstats study_PhenoCode // Standardized phenotype code

study_FilePortal // URL for GWAS sumstats repository study_FileURL // URL for direct download study_AccessDate // Date of download study_Use // Are the GWAS sumstats publicly shared?

study_includedCohorts // Contributing GWAS cohorts: study_Ancestry // Ancestry of GWAS cohorts: study_Gender // Gender of GWAS cohorts: study_PhasePanel // Phasing reference panel: study_PhaseSoftware // Phasing software: study_ImputePanel // Imputation reference panel: study_ImputeSoftware // Imputation software: study_Array // Genotyping array: study_Notes // Special considerations and notes:

stats_TraitType // GWAS trait type: stats_Model // GWAS statistical model: stats_TotalN // Total sample size stats_CaseN // Case sample size stats_ControlN // Control sample size stats_GCMethod // Approach to genomic inflation correction (GC): stats_GCValue // GC adjustment factor stats_Notes // Special considerations and notes:

col_CHR // Chromosome: col_POS // Position: col_SNP // SNP Identifier: col_EffectAllele // Statistical effect reference allele (EA): col_OtherAllele // Other allele (OA): col_BETA // Per allele effect (i.e., regression coefficient, beta): col_SE // Standard error of beta: col_OR // Odds ratio (OR): col_ORL95 // Upper 95% confidence bound of OR: col_ORU95 // Lower 95% confidence bound of OR: col_Z // Hypothesis test statistic (e.g., Z, Wald, t): col_P // P-value col_N // Per SNP total sample size col_CaseN // Per SNP case sample size col_ControlN // Per SNP control sample size col_INFO // Imputation quality score col_EAF // EA frequency in the GWAS cohorts col_OAF // OA frequency in the GWAS cohorts colDirection // Cohort effect directions (e.g., +++) col_Notes // Special considerations and notes:

General Notes:

cleansumstats_version The version should be filled in by the pipeline, automatically. It's more about tracing what was done, rather than annotating the raw sumstats file.

study_Use Perhaps if the answer is no to the above question, then a prompt appears to fill in restrictions and an "owner" or "contact"?

study/stats_notes Do we need to notes sections? Could we concatenate answers from other sections?

col_AFREQ Depricated by EAF and OAF - we can remove

rzetterberg commented 3 years ago

How much flexibility do we have in the design of the page? I'm wondering if there is a way to make it more compact, where we show minimal information unless it is requested - hidden under an info button, for example. See this:

100%, because I ended up writing my own form implementation :smile:

The existing implementations were very entangled in different javascript-frameworks, so using them would make it really hard to do custom forms and fields. So I decided to write my own implementation since it was trivial.

pappewaio commented 3 years ago

Having the form opens up a new world when it comes to notes like "study/stats_notes". @rzetterberg would it be possible to add a "default note" button to many fields and collect notes from different sections? Then we could remove the specific study/stats notes

rzetterberg commented 3 years ago

would it be possible to add a "default note" button to many fields and collect notes from different sections?

Yes, no problem! All field widgets can access the data structure that contains all field values. So you can easily pick values from other fields for auto-fill.

rzetterberg commented 3 years ago

Right now, the fields stats_ControlN, stats_CaseN and stats_TotalN have the type number. This means that you can enter real numbers and that will be valid.

But they should be natural numbers, right? You can't have a study where you used 1000.5 cases, right?

AndrewSchork commented 3 years ago

yes, decimals dont make sense for the "N" variables.

rzetterberg commented 3 years ago

[x] From discussion on slack: study_PMID should not be a required field

AndrewSchork commented 3 years ago

Sorry, I'm a bit late to this.

There is important information here that we do need to "require" somehow - How do we cite this data when we use it?

PMID, ensures we can cite the data when used

if no PMID

doi preprint link does the same

if no preprint link

we probably need a contact or data owner. Perhaps one solution is that if there is no PMID or preprint link, the data can not be listed as public, but is set to restricted and a contact (name, email) is required.

citing data is critical! :-)

rzetterberg commented 3 years ago

What we can do is setting the field to required again and adding a third option which represents studies that does not have a pubmed iD or a DOI link:

2021-03-01-163753_937x250_scrot

This third option could be any type of reference, but by doing this it forces the user to make a conscious decision on what reference to use. This will hopefully avoid situations where people do have a pubmed or DOI link, but they miss filling it in, when the field was not required.

rzetterberg commented 3 years ago

we probably need a contact or data owner. Perhaps one solution is that if there is no PMID or preprint link, the data can not be listed as public, but is set to restricted and a contact (name, email) is required.

The third option I suggested above could also be used to fill that in, if we'd like.

pappewaio commented 3 years ago

Sounds good to have it like that. Then people will feel forced to enter the correct ID

rzetterberg commented 3 years ago

I've updated the metadata form so that you can create a metadata file either from scratch or from an existing metadata file.

I have fixed most of the feedback from this issue, can you please have a look at the current version of the form and tell me if you are content with the changes or if we should change something else? @pappewaio @AndrewSchork

Here's the URL: https://biopsyk.github.io/metadata/

You should see something like this:

2021-03-16-100205_1482x916_scrot

If you don't you need to clear the browser cache so that the latest files are downloaded.

AndrewSchork commented 3 years ago

Amazing! A few comments/suggestions:

Publication ID - can the pubmed ID be the default (first) tab?

Are the GWAS sumstats publicly shared?*
If restricted is selected, we need a place to report an owner/contact email

Genotyping Array* (not required)

I would like to edit/modify some of the ontologies - names in the drop down menus, descriptions, etc.

Is there a place I can do that? In the coding formats? Or is it best to put them in an intermediate place for you to update?

joejeroe commented 3 years ago

I have a list of adjustment:

Publication PDF Name to the study PDF as referenced in the study_PMID field. This file must be located in the same directory as the metadata file.

-This is a bit confusing, so just give an example e.g., Gadin_bioinf2018.pdf

Publication title * Title of the PMID'd publication associated with the stats. Should be one line (no new line characters) and no tabs. All other characters are acceptable.

-Give an example: "A genome-wide association study of shared risk across psychiatric disorders implicates gene regulation during fetal neurodevelopment"

URL for GWAS sumstats repository and URL for direct download

-Give example

Genotyping Array

-This is a small number of options? Don't we have more in our inventory already?

Contributing GWAS cohorts

-Why is this not a drop-down menu but something that needs to included extra?

Building on what Andrew said. After somebody said "restricted data" this can only be finalised when the study controller is provided as well. Else you risk having datasets that are reported restricted and nobody knows why when people move out of the IBP.

rzetterberg commented 3 years ago

Please see my answers below to your questions. All requests without questions have been added as tasks at the end of this comment.

Andrew: Publication ID - can the pubmed ID be the default (first) tab?

Yes, the tabs are rendered in the order that the options appear in the schema, so it's just a matter of moving the "pubmid ID" option first.

Andrew: If restricted is selected, we need a place to report an owner/contact email

Joeri: 6. Building on what Andrew said. After somebody said "restricted data" this can only be finalised when the study controller is provided as well. Else you risk having datasets that are reported restricted and nobody knows why when people move out of the IBP.

We could show additional fields when this option is selected. What additional fields would you like? Two new required fields, one for name and one for email?

Andrew: I would like to edit/modify some of the ontologies - names in the drop down menus, descriptions, etc. Is there a place I can do that? In the coding formats? Or is it best to put them in an intermediate place for you to update?

Yes, we can setup your local computer with a copy of the web form and the schema, then you can edit the schema locally and preview the web form after each change. Then when you are done you can commit the changed schema file to GitHub.

I'll reach out to you on slack later today when I have prepared the web form for local use.

Joeri: 4. Genotyping Array This is a small number of options? Don't we have more in our inventory already?

I took these options from the Google Doc that was provided in the original metadata template file (https://docs.google.com/spreadsheets/d/1qghudJelGssaTbe8CDAOHOk7fhpyDAwEKGkOBMqGb3M/edit#gid=321787909)

Basically, all dropdown options you see in the form was taken from that document.

Joeri: 5. Contributing GWAS cohorts Why is this not a drop-down menu but something that needs to included extra?

The way this field is supposed to work is that the user should be able to select one or many of these values:

- "none"
- "iPSYCH2012"
- "iPSYCH2015"
- "UKB"
- "GEMS"

But I haven't finished implementing drop downs with multiple selections yet, so it's using the standard way of showing array fields, which is a "New" button and a list of values.

Todos

[x] Set "pubmed id" as first option in study_PMID field
[x] Remove "Genotyping Array" field from list of required fields
[x] Add example "Gadin_bioinf2018.pdf" to "Publication PDF" field
[x] Add example "A genome-wide association study of shared risk across psychiatric disorders implicates gene regulation during fetal neurodevelopment" to "Publication title" field
[x] Add example URLs to "URL for GWAS sumstats repository and URL for direct download" field
[x] ~Implement drop downs with multiple selections and use that for the "Contributing GWAS cohorts" field~ Add uniqueItems: true to field study_includedCohorts so that it can be used as a drop down with multiple selection

BioPsyk / cleansumstats

Change metadata schema according to web form feedback #143

Todos