EBISPOT / goci

GWAS Catalog Ontology and Curation Infrastructure
Apache License 2.0
26 stars 19 forks source link

Yaml file showing different genotyping technology (exome wide seq) from that of template (exome genotyping array) #1203

Closed Santhi1901 closed 8 months ago

Santhi1901 commented 9 months ago

For studies with genotyping technology as 'Exome genotyping array' in the curation template, the yaml file shows 'Exome-wide sequencing'

eg:

  1. Submission id: https://www.ebi.ac.uk/gwas/deposition/submission/635a4733c6aae2000190d5a5 with GCT ids: GCST90188372-GCST90188387

  2. Submission id: https://www.ebi.ac.uk/gwas/deposition/submission/63a2257dff81b70001d1f8d9 with GCST id: GCST90244057

  3. Submission id: https://www.ebi.ac.uk/gwas/deposition/submission/63d42d26ff81b7000155a9a0 with GCST ids: GCST90245765-GCST90245776 (this submission is undergoing curation now, and I am not sure if the Exome genotyping array in the template is correct. But for this submission also, the template and yaml shows different genotyping technology)

In some cases, for studies with 'Exome genotyping array' in the template, genotyping technology is missing in the yaml like: Submission https://www.ebi.ac.uk/gwas/deposition/submission/633bf885c6aae200010a2c71 with GCST id: GCST90162550

ljwh2 commented 9 months ago

@jiyue1214 Please confirm how the genotyping technology appears in the latest version of yaml files for these studies. It should be exome genotyping array, but Santhi noticed it currently appears as exome-wide sequencing.

ljwh2 commented 9 months ago

Also noting that the genotyping technology is correct in the searchj UI, e.g. for GCST90188372, shows "exome genotyping array" in the UI: https://www.ebi.ac.uk/gwas/studies/GCST90188372 but Exome-wide sequencing in the yaml

jiyue1214 commented 9 months ago

In the YAML file will be generated by the revised script, the genotyping_technology for the following studies will show:

  1. GCST90188372-GCST90188387: - Exome genotyping array (GCST90188384 not exist)
  2. GCST90244057: - Exome genotyping array
  3. GCST90245765-GCST90245776: - Exome genotyping array
  4. GCST90162550: - Exome genotyping array Please note that yaml file on the FTP will not be updated until we regenerate yaml files for these studies.
ljwh2 commented 9 months ago

@jiyue1214 thanks for confirming. Leaving the ticket open so @Santhi1901 can confirm when the new files are generated

jiyue1214 commented 8 months ago

New meta-yaml files on the staging folder are waiting to be synced to the FTP folder and show the genotyping technology as "Exome genotyping array". (example: /nfs/production/parkinso/spot/gwas/prod/data/summary_statistics/GCST90188001-GCST90189000/GCST90188372/GCST90188372_buildGRCh37.tsv-meta.yaml)