EBISPOT / goci

GWAS Catalog Ontology and Curation Infrastructure
Apache License 2.0
26 stars 19 forks source link

Different ancestries are not separated by a delimiter in the download CSV file #464

Open eks-ebi opened 3 years ago

eks-ebi commented 3 years ago

This is a bug in the construction of the CSV file generated from the Studies table in the GWAS Catalog UI.

Reported by Yue Ji from Aoife's group, so I will copy Yue's email below:

eks-ebi commented 3 years ago

Recently, I am trying to extract studies from the GWAS catalog website and I think there could be a bug in the downloaded CSV file. Here are the details:

Summary: Different ancestries are not separated by a delimiter in the download CSV file.

Submit Date: 14-09-2021
Reporter: YUE JI

Platform: GWAS catalog
Operating System: macOS Big Sur (version 11.5.2) Browser: Google Chrome

URL: https://www.ebi.ac.uk/gwas/efotraits/EFO_0003770

Description

I am interested in the studies of diabetic retinopathy (EFO_0003770), so I searched the EFO_0003770 in the GWAS catalog and downloaded CSV files of one study (GCST007292) from the studies table.

In the downloaded CSV file, multiple ancestries in the column of "Discovery sample number and ancestry" are not separated by any delimiters. It is the same in the "Replication sample number and ancestry" column.

I also tried downloading all studies of this trait, and this problem repeatedly happened to studies "GCST007289" and "GCST003042".

Another try is I downloaded all studies of another trait "EFO_0004574"(https://www.ebi.ac.uk/gwas/efotraits/EFO_0004574). There is no separator in these two columns for multiple ancestries.

Steps to reproduce

  1. Search "EFO_0003770" on GWAS catalog main page (https://www.ebi.ac.uk/gwas/home).
  2. Click the trait interested in and go to the page https://www.ebi.ac.uk/gwas/efotraits/EFO_0003770.
  3. Moved to the table of studies and searched the study "GCST007292" in the search box Screenshot 2021-09-14 at 15 17 43
  4. Clicked data export to export this study as a CSV file. Screenshot 2021-09-14 at 15 18 21 Expected result

    • multiple ancestries in the column of "Discovery sample number and ancestry" should be separated by a separator, like "1852 African American or Afro-Caribbean; 3094 European" or as same as what is present in the webpage search result
eks-ebi commented 3 years ago

If a solution is found, we should inform Yue: yueji@ebi.ac.uk