eQTL-Catalogue / eQTL-Catalogue-resources

42 stars 34 forks source link

Unable to fetch data from GWAS Catalog #3

Closed marynias closed 2 years ago

marynias commented 4 years ago

Hi, I am wondering if there is a bug in the tutorial here: https://github.com/eQTL-Catalogue/eQTL-Catalogue-resources/blob/master/scripts/eQTL_API_usecase.html

When I try to run the following command to fetch data from GWAS Catalog: ` RA_gwas_query_str <- "https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/20/associations?study_accession=GCST002318&bp_lower=45980000&bp_upper=46200000&size=1000

gwas_data <- fetch_from_eqtl_cat_API(link = RA_gwas_query_str, is_gwas = TRUE) `

I get the following error: Error: Column cols must be length 847 (the number of rows) or one, not 14

I wonder if you can reproduce that error?

kauralasoo commented 4 years ago

Hi!

Thanks for reporting it! We are currently very busy with updating the eQTL Catalogue manuscript, so not sure when we'll be able to look at the tutorial. If you are interested in colocalisation, you might also want to check our tabix tutorial: http://htmlpreview.github.io/?https://github.com/kauralasoo/eQTL-Catalogue-resources/blob/master/scripts/tabix_use_case.html

We've found that that tabix tends to be much fetching when fetching a large number of variants from a specific region.

It is also worth keeping in mind that both of the tutorials are intended to demonstrate how eQTL Catalogue data can be accessed, but they are not necessarily reliable enough to for performing thousands of colocalisations. Internally, we have written a separate Nextflow workflow to perform colocalisation, but this assumes that the summary statistics files have been downloaded to a local file system.

Best, Kaur

marynias commented 4 years ago

Hi Kaur, Thanks for your reply! I usually like to go through the test examples to make sure I understand the basic usage and manage to get the text example running. I will have a look at the Tabix example then.

The big selling point of the eQTL Catalogue in my mind is the fact that we don't have to download the massive eQTL summary stats files locally. I have one GWAS with about ~50 loci which I want to test for colocalisation across all the datasets in the eQTL Catalogue. Would you say that the API interface is suitable for this use case?

Maria

kauralasoo commented 4 years ago

Hi Maria,

Yes, that makes sense. If you have around 50 loci, I would probably use the command-line version of tabix to download those regions only form the FTP server and then proceed with coloc. See this example from our website:

tabix ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/BLUEPRINT/ge/BLUEPRINT_ge_monocyte.all.tsv.gz 20:46120612-46120613 You can also use curl to access column names:

curl -s ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/BLUEPRINT/ge/BLUEPRINT_ge_monocyte.all.tsv.gz | zcat | head -n 1

Best, Kaur

marynias commented 4 years ago

Hi Kaur, OK, thanks, that makes sense and will feed best into our existing pipeline.

marynias commented 3 years ago

Hi Kaur, I went back to my analysis with eQTL Atlas just now. In the newest version of the preprint, there is a link to this tutorial: https://github.com/eQTL-Catalogue/eQTL-Catalogue-resources/blob/master/tutorials/tabix_use_case.md I am getting an error on this line: summary_stats = import_eQTLCatalogue(platelet_df$ftp_path, region, selected_gene_id = "ENSG00000163947", column_names) The error is Cannot open specified tabix file: ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/CEDAR/microarray/CEDAR_microarray_platelet.all.tsv.gz Error in strsplit(body, "\t") : non-character argument I can access the file fine with curl. Just wondering if the seqminer package which is used to load the region supports streaming files from FTP?

marynias commented 3 years ago

I have a workaround for the issue above (using this function instead: https://rdrr.io/bioc/Rsamtools/man/scanTabix.html) but wondering if it is just me experiencing the issue?

kauralasoo commented 2 years ago

Loading data with tabix directly into R is a bit tricky. I was initially using scanTabix from Rsamtools, but had issues with memory leaks when importing large tables. I then tried switching to seqminer and it did work for me for a while, including over FTP, but it might have it's own issues.