Open earlEBI opened 2 weeks ago
@karatugo could this please be worked on along with the other yaml issues, thanks
@jiyue1214 will check how many of these have been resolved already and also deal with the quotation marks.
Based on the studies status on 27th September 2024: | Number of Studies | Study Type in YAML File |
---|---|---|
803 | '' | |
912 | 'GWAS-SSFv1.0' | |
35,823 | GWAS-SSFv1.0 | |
16,042 | non-GWAS-SSF | |
1 | 'Non-GWAS-SSF' | |
890 | Non-GWAS-SSF | |
56,677 | pre-GWAS-SSF |
Noticeably, we have not set any limitation on the value of the file_type field.
Harmonisation queue script detects the field type by if the value of the file_type
starts with GWAS-SSF
or pre-GWAS-SSF
. if none of them, the file_type will be set as "not_harm" automatically.
Hi @earlEBI, I extracted the first two rows from each sumstat reformat them into "header: value" and identified their file type based on it. Here are the results. Could you please help me to check if they are correct?
Hi, @karatugo. Since there are more than 800 studies, could you please suggest any best practices for updating both the DB and meta-yaml files?
Thank you for your support and help,
Empty yaml file types Found 785 yamls with empty file types - GCSTs listed in attached .txt file: yamls-with-empty-filetypes.txt (Unfortunately these are not all GWAS-SSF so file headers would need to be checked to determine correct file type.)
Incorrect file_type formatting Also found several yamls with file_type ' GWAS-SSFv1.0' or ' GWAS-SSF v1.0' (with single quotation marks and beginning whitespace (eg. GCST90319314). These should be removed so it reads eg. file_type: GWAS-SSFv1.0. (There is some variability about usage of 'GWAS-SSFv1.0' and 'GWAS-SSF v1.0' with added space. Could this also be cleaned up?)