EBISPOT / goci-rest

Apache License 2.0
4 stars 1 forks source link

Child trait data via API #36

Closed jiyue1214 closed 2 years ago

jiyue1214 commented 2 years ago

Hi, Could I ask for suggestions on how to include child trait data in my search result of an EFO term via API?

Here is an example of how I retrieve studies of an EFO term via API:

Screenshot 2022-05-03 at 12 55 34

The search result using API contains 16 studies. However, the GWAS catalog UI search result for the same EFO term contains 99 studies because the UI search result includes child trait data.

Screenshot 2022-05-03 at 12 55 45

Could I ask for help on how to get API's search results consistent with the UI's?

Cheers, Yue

ramiromagno commented 2 years ago

Hi @jiyue1214,

Not quite addressing your question directly, but it might be useful for you to see how I am doing it from R with the https://github.com/ramiromagno/gwasrapidd package.

Essentially, I get the children terms for "EFO_0004327" with get_child_efo(), which is using the https://www.ebi.ac.uk/ols/api/ontologies/efo API, and then I search for GWA studies with get_studies().

You will get some warnings about missing terms, just means that in the GWAS Catalog those terms have never showed up, but are nevertheless children terms of "electrocardiography".

library(gwasrapidd)

efo_trait_of_interest <- 'EFO_0004327'
efo_children <- get_child_efo(efo_id = efo_trait_of_interest)
efo_children <- c(efo_trait_of_interest, efo_children[[efo_trait_of_interest]])

eletrocard_studies <- get_studies(efo_id = efo_children)
#> Warning: The request for https://www.ebi.ac.uk/gwas/rest/api/efoTraits/
#> EFO_0600085/studies failed: response code was 404.
#> Warning in gc_request_all(resource_url = resource_url, base_url = base_url, :
#> The request for https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0600085/
#> studies failed: response code was 404.
#> Warning: The request for https://www.ebi.ac.uk/gwas/rest/api/efoTraits/
#> EFO_0020929/studies failed: response code was 404.
#> Warning in gc_request_all(resource_url = resource_url, base_url = base_url, :
#> The request for https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0020929/
#> studies failed: response code was 404.
eletrocard_studies
#> An object of class "studies"
#> Slot "studies":
#> # A tibble: 99 × 13
#>    study_id     reported_trait   initial_sample_s… replication_samp… gxe   gxg  
#>    <chr>        <chr>            <chr>             <chr>             <lgl> <lgl>
#>  1 GCST000564   Electrocardiogr… 6,543 Indian Asi… 6,243 Indian Asi… FALSE FALSE
#>  2 GCST000344   Electrocardiogr… 1,262 Kosraen in… <NA>              FALSE FALSE
#>  3 GCST000111   Electrocardiogr… 1,951 European a… <NA>              FALSE FALSE
#>  4 GCST000561   Electrocardiogr… Up to 12,670 Eur… Up to 10,352 Eur… FALSE FALSE
#>  5 GCST002542   Electrocardiogr… 2,994 Japanese a… 6,805 Korean anc… FALSE FALSE
#>  6 GCST90044268 Electrocardiogr… 272 European anc… <NA>              FALSE FALSE
#>  7 GCST90044267 Electrocardiogr… 680 European anc… <NA>              FALSE FALSE
#>  8 GCST90044269 Electrocardiogr… 3,616 European a… <NA>              FALSE FALSE
#>  9 GCST005905   Global electric… 3,057 Black indi… <NA>              FALSE FALSE
#> 10 GCST90044266 Electrocardiogr… 62,388 European … <NA>              FALSE FALSE
#> # … with 89 more rows, and 7 more variables: snp_count <int>, qualifier <chr>,
#> #   imputed <lgl>, pooled <lgl>, study_design_comment <chr>,
#> #   full_pvalue_set <lgl>, user_requested <lgl>
#> 
#> Slot "genotyping_techs":
#> # A tibble: 100 × 2
#>    study_id     genotyping_technology       
#>    <chr>        <chr>                       
#>  1 GCST000564   Genome-wide genotyping array
#>  2 GCST000344   Genome-wide genotyping array
#>  3 GCST000111   Genome-wide genotyping array
#>  4 GCST000561   Genome-wide genotyping array
#>  5 GCST002542   Genome-wide genotyping array
#>  6 GCST90044268 Genome-wide genotyping array
#>  7 GCST90044267 Genome-wide genotyping array
#>  8 GCST90044269 Genome-wide genotyping array
#>  9 GCST005905   Genome-wide genotyping array
#> 10 GCST90044266 Genome-wide genotyping array
#> # … with 90 more rows
#> 
#> Slot "platforms":
#> # A tibble: 134 × 2
#>    study_id   manufacturer
#>    <chr>      <chr>       
#>  1 GCST000564 Illumina    
#>  2 GCST000344 Affymetrix  
#>  3 GCST000111 Affymetrix  
#>  4 GCST000561 Illumina    
#>  5 GCST002542 Illumina    
#>  6 GCST005905 Affymetrix  
#>  7 GCST005905 Illumina    
#>  8 GCST010796 Affymetrix  
#>  9 GCST011010 Illumina    
#> 10 GCST011010 Affymetrix  
#> # … with 124 more rows
#> 
#> Slot "ancestries":
#> # A tibble: 219 × 4
#>    study_id     ancestry_id type        number_of_individuals
#>    <chr>              <int> <chr>                       <int>
#>  1 GCST000564             1 initial                      6543
#>  2 GCST000564             2 replication                  6243
#>  3 GCST000564             3 replication                  5370
#>  4 GCST000344             1 initial                      1262
#>  5 GCST000111             1 initial                      1951
#>  6 GCST000561             1 initial                     12670
#>  7 GCST000561             2 replication                 10352
#>  8 GCST002542             1 initial                      2994
#>  9 GCST002542             2 replication                  6805
#> 10 GCST90044268           1 initial                     67136
#> # … with 209 more rows
#> 
#> Slot "ancestral_groups":
#> # A tibble: 223 × 3
#>    study_id     ancestry_id ancestral_group
#>    <chr>              <int> <chr>          
#>  1 GCST000564             1 South Asian    
#>  2 GCST000564             2 South Asian    
#>  3 GCST000564             3 European       
#>  4 GCST000344             1 Oceanian       
#>  5 GCST000111             1 European       
#>  6 GCST000561             1 European       
#>  7 GCST000561             2 European       
#>  8 GCST002542             1 East Asian     
#>  9 GCST002542             2 East Asian     
#> 10 GCST90044268           1 European       
#> # … with 213 more rows
#> 
#> Slot "countries_of_origin":
#> # A tibble: 166 × 5
#>    study_id   ancestry_id country_name major_area region
#>    <chr>            <int> <chr>        <chr>      <chr> 
#>  1 GCST005905           1 <NA>         <NA>       <NA>  
#>  2 GCST005905           2 <NA>         <NA>       <NA>  
#>  3 GCST010796           1 <NA>         <NA>       <NA>  
#>  4 GCST011010           1 <NA>         <NA>       <NA>  
#>  5 GCST011010           2 <NA>         <NA>       <NA>  
#>  6 GCST011010           3 <NA>         <NA>       <NA>  
#>  7 GCST011010           4 <NA>         <NA>       <NA>  
#>  8 GCST003870           1 <NA>         <NA>       <NA>  
#>  9 GCST003870           2 <NA>         <NA>       <NA>  
#> 10 GCST003870           3 <NA>         <NA>       <NA>  
#> # … with 156 more rows
#> 
#> Slot "countries_of_recruitment":
#> # A tibble: 378 × 5
#>    study_id     ancestry_id country_name                     major_area region  
#>    <chr>              <int> <chr>                            <chr>      <chr>   
#>  1 GCST000564             1 U.K.                             Europe     Norther…
#>  2 GCST000564             2 U.K.                             Europe     Norther…
#>  3 GCST000564             3 U.K.                             Europe     Norther…
#>  4 GCST000344             1 Micronesia (Federated States of) Oceania    Microne…
#>  5 GCST000561             1 Iceland                          Europe     Norther…
#>  6 GCST000561             2 Iceland                          Europe     Norther…
#>  7 GCST002542             1 Japan                            Asia       Eastern…
#>  8 GCST002542             2 Republic of Korea                Asia       Eastern…
#>  9 GCST90044268           1 U.K.                             Europe     Norther…
#> 10 GCST90044267           1 U.K.                             Europe     Norther…
#> # … with 368 more rows
#> 
#> Slot "publications":
#> # A tibble: 99 × 7
#>    study_id     pubmed_id publication_date publication  title    author_fullname
#>    <chr>            <int> <date>           <chr>        <chr>    <chr>          
#>  1 GCST000564    20062061 2010-01-10       Nat Genet    Genetic… Chambers JC    
#>  2 GCST000344    19389651 2009-02-15       Heart Rhythm Genome-… Smith JG       
#>  3 GCST000111    17903306 2007-09-19       BMC Med Gen… Genome-… Newton-Cheh C  
#>  4 GCST000561    20062063 2010-01-10       Nat Genet    Several… Holm H         
#>  5 GCST002542    25055868 2014-07-23       Hum Mol Gen… Genome-… Sano M         
#>  6 GCST90044268  34737426 2021-11-04       Nat Genet    A gener… Jiang L        
#>  7 GCST90044267  34737426 2021-11-04       Nat Genet    A gener… Jiang L        
#>  8 GCST90044269  34737426 2021-11-04       Nat Genet    A gener… Jiang L        
#>  9 GCST005905    29622589 2018-04-05       J Am Heart … Genome-… Tereshchenko LG
#> 10 GCST90044266  34737426 2021-11-04       Nat Genet    A gener… Jiang L        
#> # … with 89 more rows, and 1 more variable: author_orcid <chr>
jiyue1214 commented 2 years ago

Hi, ramiromagno

It is super helpful and perfectly solves my problem. Thank you for the example script and explanations!

Yue

sprintell commented 2 years ago

Hi @jiyue1214 ,

The Catalog Rest API does not include this feature at the moment, there is no endpoint for retreiving children traits, but its planned to be part of the version 2 of the API which should be released sometimes in 2023. The only way to do it is to query the child terms from OLS as @ramiromagno shared. Thanks @ramiromagno for helping with your solution.

Best Regards

Yomi

jiyue1214 commented 2 years ago

I am looking forward to the release of version 2 of the API and thank you for all your hard work on it. What @ramiromagno shares perfectly help me to solve my current problem and thanks again.