EBISPOT / goci

GWAS Catalog Ontology and Curation Infrastructure
Apache License 2.0
26 stars 19 forks source link

Empty yaml file types (and incorrect formatting) #1424

Open earlEBI opened 2 weeks ago

earlEBI commented 2 weeks ago

Empty yaml file types Found 785 yamls with empty file types - GCSTs listed in attached .txt file: yamls-with-empty-filetypes.txt (Unfortunately these are not all GWAS-SSF so file headers would need to be checked to determine correct file type.)

Incorrect file_type formatting Also found several yamls with file_type ' GWAS-SSFv1.0' or ' GWAS-SSF v1.0' (with single quotation marks and beginning whitespace (eg. GCST90319314). These should be removed so it reads eg. file_type: GWAS-SSFv1.0. (There is some variability about usage of 'GWAS-SSFv1.0' and 'GWAS-SSF v1.0' with added space. Could this also be cleaned up?)

ljwh2 commented 1 week ago

@karatugo could this please be worked on along with the other yaml issues, thanks

ljwh2 commented 6 days ago

@jiyue1214 will check how many of these have been resolved already and also deal with the quotation marks.

jiyue1214 commented 4 days ago
Based on the studies status on 27th September 2024: Number of Studies Study Type in YAML File
803 ''
912 'GWAS-SSFv1.0'
35,823 GWAS-SSFv1.0
16,042 non-GWAS-SSF
1 'Non-GWAS-SSF'
890 Non-GWAS-SSF
56,677 pre-GWAS-SSF

Noticeably, we have not set any limitation on the value of the file_type field. Harmonisation queue script detects the field type by if the value of the file_type starts with GWAS-SSF or pre-GWAS-SSF. if none of them, the file_type will be set as "not_harm" automatically.

jiyue1214 commented 4 days ago

Hi @earlEBI, I extracted the first two rows from each sumstat reformat them into "header: value" and identified their file type based on it. Here are the results. Could you please help me to check if they are correct?

Hi, @karatugo. Since there are more than 800 studies, could you please suggest any best practices for updating both the DB and meta-yaml files?

Thank you for your support and help,