BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
295 stars 112 forks source link

Problem with GDCquery for downloading "Biospecimen" #360

Closed boseb closed 3 years ago

boseb commented 5 years ago

Hi, I am trying to download TCGA biospecimen data using below query

queryOV <- GDCquery(project = "TCGA-OV", 
                  data.category = "Biospecimen", legacy = F,
                  file.type = "xml")

ov_patients <- getResults(query, cols="cases")

GDCdownload(queryOV,directory = "GDC_CLINICAL")

##########
I am having error
Downloading data for project TCGA-OV
GDCdownload will download 1110 files. A total of 71.799491 MB
Downloading as: Mon_Oct_21_20_16_15_2019.tar.gz
  |============================================================================================================| 100%
/usr/bin/gtar: This does not look like a tar archive

gzip: stdin: not in gzip format
/usr/bin/gtar: Child returned status 1
/usr/bin/gtar: Error is not recoverable: exiting now
Download completed
Warning message:
In fun(libname, pkgname) : couldn't connect to display ":0"

######## I also found that Found it, this isn’t a tar or a gzip file. It’s ASCII text – it shouldn’t work.

cat Mon_Oct_21_12_34_28_2019.tar.gz

{ "message": "Your token is invalid or expired. Please get a new token from GDC Data Portal." }

######### Could you please help to resolve this issue


> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                  LC_TIME=en_US.UTF-8          
 [4] LC_COLLATE=en_US.UTF-8        LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
 [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8           LC_ADDRESS=en_US.UTF-8       
[10] LC_TELEPHONE=en_US.UTF-8      LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] regexPipes_0.0.1            dnet_1.1.4                  supraHex_1.22.0             hexbin_1.27.3              
 [5] rlang_0.4.0                 doParallel_1.0.15           iterators_1.0.12            Rcpp_1.0.2                 
 [9] RTCGAToolbox_2.14.0         DT_0.8                      dplyr_0.8.3                 iRefR_1.13                 
[13] RBGL_1.58.2                 graph_1.62.0                igraph_1.2.4.1              WGCNA_1.68                 
[17] fastcluster_1.1.25          dynamicTreeCut_1.63-1       clusterSim_0.47-4           MASS_7.3-51.4              
[21] cluster_2.0.8               zoo_1.8-6                   foreign_0.8-71              sp_1.3-1                   
[25] limma_3.40.6                SummarizedExperiment_1.14.1 DelayedArray_0.10.0         BiocParallel_1.18.1        
[29] matrixStats_0.55.0          Biobase_2.44.0              GenomicRanges_1.36.1        GenomeInfoDb_1.20.0        
[33] IRanges_2.18.2              S4Vectors_0.22.1            BiocGenerics_0.30.0         TCGAbiolinks_2.13.6        
[37] tidyr_1.0.0                 caret_6.0-84                ggplot2_3.2.1               lattice_0.20-38            
[41] biomaRt_2.40.4              xlsx_0.6.1                  CePa_0.6                    pbapply_1.4-1              
[45] reshape2_1.4.3              reshape_0.8.8               sqldf_0.4-11                RSQLite_2.1.2              
[49] gsubfn_0.7                  proto_1.0.0                 data.table_1.12.2           plyr_1.8.4                 
[53] stringr_1.4.0               janitor_1.1.1               glmnet_2.0-18               foreach_1.4.7              
[57] Matrix_1.2-17               readr_1.3.1                 BBmisc_1.11                 svdvis_0.1                 

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.1              rtracklayer_1.44.4          ggthemes_4.2.0              GGally_1.4.0               
  [5] ModelMetrics_1.2.2          R.methodsS3_1.7.1           acepack_1.4.1               bit64_0.9-7                
  [9] knitr_1.24                  aroma.light_3.14.0          R.utils_2.9.0               rpart_4.1-15               
 [13] hwriter_1.3.2               RCurl_1.95-4.12             generics_0.0.2              GenomicFeatures_1.36.4     
 [17] preprocessCore_1.46.0       chron_2.3-54                bit_1.1-14                  webshot_0.5.1              
 [21] xml2_1.2.2                  lubridate_1.7.4             httpuv_1.5.1                assertthat_0.2.1           
 [25] gower_0.2.1                 xfun_0.9                    hms_0.5.1                   rJava_0.9-11               
 [29] promises_1.0.1              DEoptimR_1.0-8              progress_1.2.2              dbplyr_1.3.0               
 [33] Rgraphviz_2.26.0            km.ci_0.5-2                 DBI_1.0.0                   geneplotter_1.62.0         
 [37] htmlwidgets_1.3             EDASeq_2.18.0               matlab_1.0.2                purrr_0.3.2                
 [41] crosstalk_1.0.0             selectr_0.4-1               ggpubr_0.2.3                backports_1.1.4            
 [45] annotate_1.62.0             vctrs_0.2.0                 withr_2.1.2                 robustbase_0.93-4          
 [49] checkmate_1.9.4             GenomicAlignments_1.20.1    prettyunits_1.0.2           ape_5.3                    
 [53] lazyeval_0.2.2              crayon_1.3.4                genefilter_1.66.0           edgeR_3.26.8               
 [57] recipes_0.1.6               pkgconfig_2.0.2             nlme_3.1-139                nnet_7.3-12                
 [61] RJSONIO_1.3-1.2             lifecycle_0.1.0             miniUI_0.1.1.1              downloader_0.4             
 [65] BiocFileCache_1.6.0         tcltk_3.6.0                 KMsurv_0.1-5                base64enc_0.1-3            
 [69] GlobalOptions_0.1.0         png_0.1-7                   rjson_0.2.20                bitops_1.0-6               
 [73] R.oo_1.22.0                 ConsensusClusterPlus_1.48.0 Biostrings_2.52.0           blob_1.2.0                 
 [77] rgl_0.100.30                R2HTML_2.3.2                shape_1.4.4                 manipulateWidget_0.10.0    
 [81] ShortRead_1.42.0            robust_0.4-18               ggsignif_0.6.0              scales_1.0.0               
 [85] memoise_1.1.0               magrittr_1.5                zlibbioc_1.30.0             compiler_3.6.0             
 [89] RColorBrewer_1.1-2          clue_0.3-57                 rrcov_1.4-7                 Rsamtools_2.0.0            
 [93] ade4_1.7-13                 XVector_0.24.0              htmlTable_1.13.1            Formula_1.2-3              
 [97] mgcv_1.8-28                 tidyselect_0.2.5            stringi_1.4.3               locfit_1.5-9.1             
[101] latticeExtra_0.6-28         ggrepel_0.8.1               survMisc_0.5.5              grid_3.6.0                 
[105] RaggedExperiment_1.8.0      tools_3.6.0                 circlize_0.4.8              rstudioapi_0.10            
[109] gridExtra_2.3               prodlim_2018.04.18          digest_0.6.20               shiny_1.3.1                
[113] lava_1.6.6                  broom_0.5.2                 later_0.8.0                 httr_1.4.1                 
[117] survminer_0.4.6             AnnotationDbi_1.46.1        RCircos_1.2.1               ComplexHeatmap_2.0.0       
[121] colorspace_1.4-1            rvest_0.3.4                 XML_3.98-1.20               splines_3.6.0              
[125] xlsxjars_0.6.1              fit.models_0.5-14           xtable_1.8-4                jsonlite_1.6               
[129] timeDate_3043.102           zeallot_0.1.0               ipred_0.9-8                 R6_2.4.0                   
[133] Hmisc_4.2-0                 pillar_1.4.2                htmltools_0.3.6             mime_0.7                   
[137] glue_1.3.1                  DESeq_1.36.0                class_7.3-15                codetools_0.2-16           
[141] pcaPP_1.9-73                mvtnorm_1.0-11              tibble_2.1.3                sva_3.32.1                 
[145] curl_4.0                    GO.db_3.8.2                 survival_2.44-1.1           munsell_0.5.0              
[149] e1071_1.7-2                 GetoptLong_0.1.7            GenomeInfoDbData_1.2.1      impute_1.58.0              
[153] gtable_0.3.0       
tiagochst commented 5 years ago

I checked your code, and indeed there is a problem, that I need to debug.

For the moment could you look on the BRC Biotab files? It should contain the same data as the XML files.

Here is an example: http://rpubs.com/tiagochst/BCR_Biotab

You will need to update the package with the GitHub version.

withr::with_envvar(c(R_REMOTES_NO_ERRORS_FROM_WARNINGS="true"), 
  remotes::install_github('BioinformaticsFMRP/TCGAbiolinks')
)
boseb commented 5 years ago

Thanks, got my required data.


From: Tiago Chedraoui Silva notifications@github.com Sent: Tuesday, October 22, 2019 1:05 PM To: BioinformaticsFMRP/TCGAbiolinks TCGAbiolinks@noreply.github.com Cc: Bose, Banabithi banabithi.bose@marquette.edu; Author author@noreply.github.com Subject: Re: [BioinformaticsFMRP/TCGAbiolinks] Problem with GDCquery for downloading "Biospecimen" (#360)

I checked your code, and indeed there is a problem, that I need to debug.

For the moment could you look on the BRC Biotab files? It should contain the same data as the XML files.

Here is an example: http://rpubs.com/tiagochst/BCR_Biotabhttps://urldefense.proofpoint.com/v2/url?u=http-3A__rpubs.com_tiagochst_BCR-5FBiotab&d=DwMCaQ&c=S1d2Gs1Y1NQV8Lx35_Qi5FnTH2uYWyh_OhOS94IqYCo&r=Y9AYXRQMBdF6y5D9a5c7n_rNkgbQp1bOb8KjFhsj2gY&m=cY_RJR32wX7SZI6ijULQjXcz7R43nqKaQVfjOCyZJwk&s=BdpnN91ji9hlvZb5QVycTu5g6t6pdKDlqCXu4npQtlg&e=

You will need to update the package with the GitHub version.

withr::with_envvar(c(R_REMOTES_NO_ERRORS_FROM_WARNINGS="true"), remotes::install_github('BioinformaticsFMRP/TCGAbiolinks') )

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_BioinformaticsFMRP_TCGAbiolinks_issues_360-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAIAN2K3IODITGXDKVPBR4NLQP46FPA5CNFSM4JDJK2D2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB6VFSQ-23issuecomment-2D545084106&d=DwMCaQ&c=S1d2Gs1Y1NQV8Lx35_Qi5FnTH2uYWyh_OhOS94IqYCo&r=Y9AYXRQMBdF6y5D9a5c7n_rNkgbQp1bOb8KjFhsj2gY&m=cY_RJR32wX7SZI6ijULQjXcz7R43nqKaQVfjOCyZJwk&s=tVRA4ao3YGfg9txIAoIJ6qfreUza3IlMid3k2wYJKIs&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AIAN2K3WV2WAQT7GG4OI723QP46FPANCNFSM4JDJK2DQ&d=DwMCaQ&c=S1d2Gs1Y1NQV8Lx35_Qi5FnTH2uYWyh_OhOS94IqYCo&r=Y9AYXRQMBdF6y5D9a5c7n_rNkgbQp1bOb8KjFhsj2gY&m=cY_RJR32wX7SZI6ijULQjXcz7R43nqKaQVfjOCyZJwk&s=zitv9EhpZysA9rk_qJ7QDqJYe7bywuONmW-xFnTsUJ8&e=.

boseb commented 5 years ago

Hi, I am having similar type of error when trying to download RnaSeq count data for BRCA Downloading data for project TCGA-BRCA GDCdownload will download 1222 files. A total of 310.760859 MB Downloading as: Thu_Oct_24_13_55_13_2019.tar.gz

0%
====================================================================== 100%

/usr/bin/gtar: This does not look like a tar archive

gzip: stdin: not in gzip format /usr/bin/gtar: Child returned status 1 /usr/bin/gtar: Error is not recoverable: exiting now

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">

504 Gateway Timeout

Gateway Timeout

The gateway did not receive a timely response from the upstream server or application.

##############

My query was, Rseq.query <- GDCquery(project = paste0("TCGA-",fileName), data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "HTSeq - Counts", legacy = FALSE)

sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] SummarizedExperiment_1.14.1 DelayedArray_0.10.0 BiocParallel_1.18.1 matrixStats_0.55.0
[5] Biobase_2.44.0 GenomicRanges_1.36.1 GenomeInfoDb_1.20.0 IRanges_2.18.3
[9] S4Vectors_0.22.1 BiocGenerics_0.30.0 biomaRt_2.40.5 TCGAbiolinks_2.13.6
[13] edgeR_3.26.8 limma_3.40.6 pbapply_1.4-1 reshape2_1.4.3
[17] reshape_0.8.8 sqldf_0.4-11 RSQLite_2.1.2 gsubfn_0.7
[21] proto_1.0.0 readr_1.3.1 plyr_1.8.4 stringr_1.4.0
[25] janitor_1.1.1 glmnet_2.0-18 foreach_1.4.7 Matrix_1.2-17
[29] data.table_1.12.6

loaded via a namespace (and not attached): [1] backports_1.1.5 circlize_0.4.8 aroma.light_3.14.0 selectr_0.4-1
[5] ConsensusClusterPlus_1.48.0 lazyeval_0.2.2 splines_3.6.0 usethis_1.5.1
[9] ggplot2_3.2.1 sva_3.32.1 digest_0.6.22 magrittr_1.5
[13] memoise_1.1.0 cluster_2.0.8 doParallel_1.0.15 remotes_2.1.0
[17] ComplexHeatmap_2.0.0 Biostrings_2.52.0 annotate_1.62.0 R.utils_2.9.0
[21] prettyunits_1.0.2 colorspace_1.4-1 blob_1.2.0 rvest_0.3.4
[25] ggrepel_0.8.1 xfun_0.10 dplyr_0.8.3 callr_3.3.2
[29] tcltk_3.6.0 crayon_1.3.4 RCurl_1.95-4.12 jsonlite_1.6
[33] genefilter_1.66.0 zeallot_0.1.0 survival_2.44-1.1 zoo_1.8-6
[37] iterators_1.0.12 glue_1.3.1 survminer_0.4.6 gtable_0.3.0
[41] zlibbioc_1.30.0 XVector_0.24.0 GetoptLong_0.1.7 pkgbuild_1.0.6
[45] shape_1.4.4 scales_1.0.0 DESeq_1.36.0 DBI_1.0.0
[49] ggthemes_4.2.0 Rcpp_1.0.2 xtable_1.8-4 progress_1.2.2
[53] clue_0.3-57 bit_1.1-14 matlab_1.0.2 km.ci_0.5-2
[57] httr_1.4.1 RColorBrewer_1.1-2 ellipsis_0.3.0 pkgconfig_2.0.3
[61] XML_3.98-1.20 R.methodsS3_1.7.1 locfit_1.5-9.1 tidyselect_0.2.5
[65] rlang_0.4.0 AnnotationDbi_1.46.1 munsell_0.5.0 tools_3.6.0
[69] cli_1.1.0 downloader_0.4 generics_0.0.2 devtools_2.2.1
[73] broom_0.5.2 fs_1.3.1 processx_3.4.1 knitr_1.25
[77] bit64_0.9-7 survMisc_0.5.5 purrr_0.3.3 EDASeq_2.18.0
[81] nlme_3.1-139 R.oo_1.22.0 xml2_1.2.2 compiler_3.6.0
[85] rstudioapi_0.10 curl_4.2 png_0.1-7 testthat_2.2.1
[89] ggsignif_0.6.0 tibble_2.1.3 geneplotter_1.62.0 stringi_1.4.3
[93] highr_0.8 ps_1.3.0 desc_1.2.0 GenomicFeatures_1.36.4
[97] lattice_0.20-38 KMsurv_0.1-5 vctrs_0.2.0 pillar_1.4.2
[101] lifecycle_0.1.0 GlobalOptions_0.1.1 bitops_1.0-6 rtracklayer_1.44.4
[105] R6_2.4.0 latticeExtra_0.6-28 hwriter_1.3.2 ShortRead_1.42.0
[109] gridExtra_2.3 sessioninfo_1.1.1 codetools_0.2-16 pkgload_1.0.2
[113] assertthat_0.2.1 chron_2.3-54 rprojroot_1.3-2 rjson_0.2.20
[117] withr_2.1.2 GenomicAlignments_1.20.1 Rsamtools_2.0.3 GenomeInfoDbData_1.2.1
[121] mgcv_1.8-28 hms_0.5.1 grid_3.6.0 tidyr_1.0.0
[125] ggpubr_0.2.3

Do you have any other way of downloading it?

Thanks Bose