QuackenbushLab / yarn

YARN: Robust Multi-Tissue RNA-Seq Preprocessing and Normalization
13 stars 2 forks source link

Failed at first step #2

Open jpmam1 opened 6 years ago

jpmam1 commented 6 years ago

I failed at the first step. The example dataset (skin) worked great, but I wasn't able to load the GTEx data (failed on Mac and Ubuntu).

Any ideas on how to proceed would be very much appreciated.

Thanks, Jared


library(yarn)
obj <- downloadGTEx(type='genes',file='~/Desktop/gtex.rds')
Downloading and reading files
trying URL 'http://www.gtexportal.org/static/datasets/gtex_analysis_v6/annotations/GTEx_Data_V6_Annotations_SampleAttributesDS.txt'
Content type 'text/html' length 32619 bytes (31 KB)
==================================================
downloaded 31 KB

Parsed with column specification:
cols(
  `<!DOCTYPE html>` = col_character()
)
Warning: 1 parsing failure.
row # A tibble: 1 x 5 col     row   col  expected    actual expected   <int> <chr>     <chr>     <chr> actual 1   162  <NA> 1 columns 7 columns file # ... with 1 more variables: file <chr>

Error in pd[, "SAMPID"] : subscript out of bounds

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] yarn_1.2.0           Biobase_2.36.2       BiocGenerics_0.22.1  BiocInstaller_1.26.1
 [5] dplyr_0.7.4          purrr_0.2.4          readr_1.1.1          tidyr_0.7.2         
 [9] tibble_1.3.4         ggplot2_2.2.1        tidyverse_1.1.1     

loaded via a namespace (and not attached):
  [1] colorspace_1.3-2           siggenes_1.50.0            mclust_5.3                
  [4] XVector_0.16.0             GenomicRanges_1.28.6       quantro_1.10.0            
  [7] base64_2.0                 bit64_0.9-7                AnnotationDbi_1.38.2      
 [10] lubridate_1.6.0            xml2_1.1.1                 codetools_0.2-15          
 [13] splines_3.4.0              mnormt_1.5-5               doParallel_1.0.11         
 [16] jsonlite_1.5               Rsamtools_1.28.0           broom_0.4.2               
 [19] annotate_1.54.0            compiler_3.4.0             httr_1.3.1                
 [22] assertthat_0.2.0           Matrix_1.2-11              lazyeval_0.2.0            
 [25] limma_3.32.10              tools_3.4.0                bindrcpp_0.2              
 [28] gtable_0.2.0               glue_1.1.1                 GenomeInfoDbData_0.99.0   
 [31] reshape2_1.4.2             doRNG_1.6.6                Rcpp_0.12.13              
 [34] cellranger_1.1.0           bumphunter_1.16.0          Biostrings_2.44.2         
 [37] multtest_2.32.0            gdata_2.18.0               preprocessCore_1.38.1     
 [40] nlme_3.1-131               rtracklayer_1.36.6         iterators_1.0.8           
 [43] psych_1.7.8                stringr_1.2.0              rvest_0.3.2               
 [46] rngtools_1.2.4             gtools_3.5.0               XML_3.98-1.9              
 [49] beanplot_1.2               edgeR_3.18.1               zlibbioc_1.22.0           
 [52] MASS_7.3-47                scales_0.5.0               hms_0.3                   
 [55] SummarizedExperiment_1.6.5 GEOquery_2.42.0            minfi_1.22.1              
 [58] RColorBrewer_1.1-2         memoise_1.1.0              downloader_0.4            
 [61] pkgmaker_0.22              biomaRt_2.32.1             reshape_0.8.7             
 [64] stringi_1.1.5              RSQLite_2.0                genefilter_1.58.1         
 [67] S4Vectors_0.14.7           foreach_1.4.3              GenomicFeatures_1.28.5    
 [70] caTools_1.17.1             BiocParallel_1.10.1        GenomeInfoDb_1.12.3       
 [73] rlang_0.1.2                pkgconfig_2.0.1            matrixStats_0.52.2        
 [76] bitops_1.0-6               nor1mix_1.2-3              lattice_0.20-35           
 [79] bindr_0.1                  GenomicAlignments_1.12.2   bit_1.1-12                
 [82] plyr_1.8.4                 magrittr_1.5               R6_2.2.2                  
 [85] IRanges_2.10.5             gplots_3.0.1               DelayedArray_0.2.7        
 [88] DBI_0.7                    haven_1.1.0                foreign_0.8-69            
 [91] survival_2.41-3            RCurl_1.95-4.8             modelr_0.1.1              
 [94] KernSmooth_2.23-15         locfit_1.5-9.1             grid_3.4.0                
 [97] readxl_1.0.0               data.table_1.10.4-2        blob_1.1.0                
[100] forcats_0.2.0              digest_0.6.12              xtable_1.8-2              
[103] illuminaio_0.18.0          openssl_0.9.7              stats4_3.4.0              
[106] munsell_0.4.3              registry_0.3               quadprog_1.5-5```

--

Ubuntu 14.04:
> library(yarn)
> obj <- downloadGTEx(type='genes',file='~/gtex.rds')

Downloading and reading files 

downloaded 0 bytes

Error in download.file(url, method = method, ...) :  
  cannot download all files
In addition: Warning message:
In download.file(url, method = method, ...) : 
  URL 'https://www.gtexportal.org/static/datasets/gtex_analysis_v6/annotations/GTEx_Data_V6_Annotations_SampleAttributesDS.txt': status was 'Peer certificate cannot be authenticated with given CA certificates'

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS 
locale:  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8         [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8     [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C        

attached base packages: [1] parallel  stats     graphics  grDevices utils     datasets  methods   base       

other attached packages:  [1] yarn_1.0.1           Biobase_2.34.0       BiocGenerics_0.20.0  dplyr_0.7.1           [5] purrr_0.2.2.2        readr_1.1.1          tidyr_0.6.3          tibble_1.3.3          [9] ggplot2_2.2.1        tidyverse_1.1.1      BiocInstaller_1.24.0  loaded via a namespace (and not attached):   [1] colorspace_1.3-2           siggenes_1.48.0            mclust_5.3                   [4] XVector_0.14.1             GenomicRanges_1.26.4       quantro_1.8.0                [7] base64_2.0                 bit64_0.9-7                AnnotationDbi_1.36.2        [10] lubridate_1.6.0            xml2_1.1.1                 codetools_0.2-15            [13] splines_3.3.3              mnormt_1.5-5               doParallel_1.0.11           [16] jsonlite_1.5               Rsamtools_1.26.2           broom_0.4.2                 [19] annotate_1.52.1            httr_1.2.1                 assertthat_0.2.0            [22] Matrix_1.2-10              lazyeval_0.2.0             limma_3.30.13               [25] tools_3.3.3                bindrcpp_0.2               gtable_0.2.0                [28] glue_1.1.1                 reshape2_1.4.2             doRNG_1.6.6                 [31] Rcpp_0.12.11               cellranger_1.1.0           bumphunter_1.14.0           [34] Biostrings_2.42.1          multtest_2.30.0            gdata_2.18.0                [37] preprocessCore_1.36.0      nlme_3.1-131               rtracklayer_1.34.2          [40] iterators_1.0.8            psych_1.7.5                stringr_1.2.0               [43] rvest_0.3.2                rngtools_1.2.4             gtools_3.5.0                [46] XML_3.98-1.9               beanplot_1.2               edgeR_3.16.5                [49] zlibbioc_1.20.0            MASS_7.3-47                scales_0.4.1                [52] hms_0.3                    SummarizedExperiment_1.4.0 GEOquery_2.40.0             [55] minfi_1.20.2               RColorBrewer_1.1-2         memoise_1.1.0               [58] downloader_0.4             pkgmaker_0.22              biomaRt_2.30.0              [61] reshape_0.8.7              stringi_1.1.5              RSQLite_2.0                 [64] genefilter_1.56.0          S4Vectors_0.12.2           foreach_1.4.3               [67] GenomicFeatures_1.26.4     caTools_1.17.1             BiocParallel_1.8.2          [70] GenomeInfoDb_1.10.3        rlang_0.1.1                pkgconfig_2.0.1             [73] matrixStats_0.52.2         bitops_1.0-6               nor1mix_1.2-3               [76] lattice_0.20-35            bindr_0.1                  GenomicAlignments_1.10.1    [79] bit_1.1-12                 plyr_1.8.4                 magrittr_1.5                [82] R6_2.2.2                   IRanges_2.8.2              gplots_3.0.1                [85] DBI_0.7                    haven_1.0.0                foreign_0.8-69              [88] survival_2.41-3            RCurl_1.95-4.8             modelr_0.1.0                [91] KernSmooth_2.23-15         locfit_1.5-9.1             grid_3.3.3                  [94] readxl_1.0.0               data.table_1.10.4          blob_1.1.0                  [97] forcats_0.2.0              digest_0.6.12              xtable_1.8-2               [100] illuminaio_0.16.0          openssl_0.9.6              stats4_3.3.3               [103] munsell_0.4.3              registry_0.3               quadprog_1.5-5
--
 
jnpaulson commented 6 years ago

It appears GTEx replaced the static locations for their data. I'm going to have to figure a work-around for that. In the meantime, you can manually go to https://www.gtexportal.org/static/datasets/ (login) and download the files and load them in following the script within downloadGTEx.

ameya225 commented 6 years ago

I am still getting the same error as @jpmam1 when trying to download other GTEx data, apart from the example (skin dataset). @jnpaulson is there an update with the work-around for this?

DCGenomics commented 6 years ago

Are you getting it through dbGaP, or elsewhere?

On Tue, Jan 9, 2018 at 6:48 PM, Ameya Kulkarni notifications@github.com wrote:

I am still getting the same error as @jpmam1 https://github.com/jpmam1 when trying to download other GTEx data, apart from the example (skin dataset). @jnpaulson https://github.com/jnpaulson is there an update with the work-around for this?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/QuackenbushLab/yarn/issues/2#issuecomment-356451864, or mute the thread https://github.com/notifications/unsubscribe-auth/AFePtR11pTf_fVfx_Y6_DdAyLuPSa9bIks5tI_q9gaJpZM4QBYAP .

-- What have you done today to make the world a better place?

jnpaulson commented 6 years ago

Hi all, yes. Just decided on a workaround - and it will be implemented this coming week. I'll update this post soon.

Thank you

jnpaulson commented 6 years ago

@DCGenomics , @ameya225 , @jpmam1

Hi all - apologies for the delay. As a temporary workaround (Bioconductor has a versioning delay) I'm placing the V6 data formatted already as a YARN processed ExpressionSet.

The file is an RDS object (meaning you read it in with the readRDS function). It's ~1.4 Gigs.

http://networkmedicine.org:3838/gtex_data/gtex_portal_normalized.rds

To grab it from the source within a script, one can easily:

library(downloader)
tmp = tempfile()
src = "http://networkmedicine.org:3838/gtex_data/gtex_portal_normalized.rds"
download(src,tmp)

obj = readRDS(tmp)
Sas-cmd commented 5 years ago

Hi there,

I was trying to run yarn on version 8 of GTEx. I changed the code for downloadGTEx to reflect the new websites for the new version. But I am getting a similar error has the previous people. Is there a workaround?

I've tried everything I could think of. It is still not working? Any suggestions? @jnpaulson

EladH1 commented 2 years ago

@DCGenomics , @ameya225 , @jpmam1

Hi all - apologies for the delay. As a temporary workaround (Bioconductor has a versioning delay) I'm placing the V6 data formatted already as a YARN processed ExpressionSet.

hi, question-related to the workaround: what is the normalized.rds file? normelized? raw counts? any filters made for selecting just part of the GTEx data?

thanks,