LieberInstitute / recount3

Explore and download data from the recount3 project
http://lieberinstitute.github.io/recount3
31 stars 4 forks source link

Unable to download a few SRA studies #14

Closed prashanthi-ravichandran closed 2 years ago

prashanthi-ravichandran commented 2 years ago

I was downloading the recount gene count matrix, but for some studies, I get the following error.

sra_info[1, ] project organism file_source project_home project_type n_samples 1 SRP107565 human sra data_sources/sra data_sources 216 rse <- create_rse(sra_info[1, ]) 2022-01-10 11:27:35 downloading and reading the metadata. 2022-01-10 11:27:36 caching file sra.sra.SRP107565.MD.gz. 2022-01-10 11:27:36 caching file sra.recount_project.SRP107565.MD.gz. 2022-01-10 11:27:37 caching file sra.recount_qc.SRP107565.MD.gz. 2022-01-10 11:27:37 caching file sra.recount_seq_qc.SRP107565.MD.gz. 2022-01-10 11:27:38 caching file sra.recount_pred.SRP107565.MD.gz. Error in match.arg(project_home) : 'arg' should be one of “data_sources/sra”, “data_sources/gtex”, “data_sources/tcga”

I don't believe someone has posted something similar. Another study where this happens is,

sra_info[82, ] project organism file_source project_home project_type 414 SRP104120 human sra data_sources/sra data_sources n_samples 414 43 rse <- create_rse(sra_info[82, ]) 2022-01-10 11:30:03 downloading and reading the metadata. 2022-01-10 11:30:04 caching file sra.sra.SRP104120.MD.gz. 2022-01-10 11:30:04 caching file sra.recount_project.SRP104120.MD.gz. 2022-01-10 11:30:05 caching file sra.recount_qc.SRP104120.MD.gz. 2022-01-10 11:30:05 caching file sra.recount_seq_qc.SRP104120.MD.gz. 2022-01-10 11:30:06 caching file sra.recount_pred.SRP104120.MD.gz. Error in match.arg(project_home) : 'arg' should be one of “data_sources/sra”, “data_sources/gtex”, “data_sources/tcga”

Even though the project_home is data_sources/sra in both cases, the gene count matrix doesn't seem to download.

lcolladotor commented 2 years ago

Hi @prashanthi-ravichandran,

Thank you for your interest in recount3 and for taking the time to report this issue.

Could you use the reprex package https://reprex.tidyverse.org/ or post code to easily make the contents of sra_info[1, ] and sra_info[82, ]? That is, can you make it easier for me to try to help you? Thanks! The output from sessioninfo::session_info() can also be useful too. If you want, more details about this are included in the template issue at https://github.com/LieberInstitute/recount3/issues/new/choose.

This video from the LIBD rstats club I run might be useful https://youtu.be/8bBo3B7N8YQ. See https://docs.google.com/spreadsheets/d/1is8dZSd0FZ9Qi1Zvq1uRhm-P1McnJRd_zxdAfCRoMfA/edit?usp=sharing or https://www.youtube.com/c/LeonardoColladoTorres/playlists for more information.

Best, Leo

ChristopherWilks commented 2 years ago

Hi @prashanthi-ravichandran,

I 2nd what @lcolladotor said above, however, I had a suspicion for why these two studies might have failed, so I went ahead and fixed them. The issue is that neither study had any predicted tissue types, this is an optional additional feature of recount3 which apparently didn't get applied to some studies or failed for some reason.

In any case I simply disabled the predictions and the gene sums should now load for both those studies. You will get an a warning message which you can ignore, e.g.:


The 'url' <http://duffel.rail.bio/recount3/human/data_sources/sra/metadata/20/SRP104120/sra.recount_pred.SRP104120.MD.gz> does not exist or is not available.```
lcolladotor commented 2 years ago

Hi Chris,

How did you disable the predictions? Is this something I need to change in the R package?

Best, Leo

PS @prashanthi-ravichandran I mentioned this issue on the LIBD rstats club today. I'll edit and upload the video soon-ish.

nmra-cwilks commented 2 years ago

@lcolladotor I just moved the prediction sra.recount_pred.SRP104120.MD.gz on the filesystem so it wouldn't be picked up at all by recount3 (it just had the header), which turns the error into the warning above.

lcolladotor commented 2 years ago

Ohh cool, thanks Chris! Sounds we can close this issue then. Thanks!

PS Here's the new video https://youtu.be/-6sZyp0hNwU and tweet https://twitter.com/LIBDrstats/status/1487147235557232642.

Best, Leo