BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
289 stars 110 forks source link

Problem for downloading TCGA cancer data by TCGAbiolinks #376

Open modarzi opened 4 years ago

modarzi commented 4 years ago

Hi, I would like to download TCGA SARC data from GDC by TCGAbiolinks package.I wrote below code for that pupose:

```
     query <- GDCquery(project = "TCGA-SARC",sample.type = "Primary solid Tumor",
                data.category = "Transcriptome Profiling",
                data.type = "Gene Expression Quantification",workflow.type = "HTSeq - Counts")
     GDCdownload(query) 
 when I run below code:

     `data <- GDCprepare(query)`

I see below message in console:

Downloading data for project TCGA-SARC GDCdownload will download 259 files. A total of 64.930633 MB Downloading as: Thu_Dec_12_14_39_40_2019.tar.gz Downloading: 65 MB

After a while and downloading 65 MB, I didn't see 'data' as R object. I see some message same as below:
ff8fba5e-f6e2-4db6-87cf-11958b27bc37/2b66a8a5-1e0a-4a4a-beb7-b2273b2ffb05.htseq.counts.gz: Can't create '\\\\?\\F:\\ff8fba5e-f6e2-4db6-87cf-11958b27bc37\\2b66a8a5-1e0a-4a4a-beb7-b2273b2ffb05.htseq.counts.gz'
    tar.exe: Error exit delayed from previous errors.

I setwd() in my F driver. now in 'F' driver, I see just 2 files:

1- MANIFEST.txt 

2-Thu_Dec_12_14_39_40_2019.tar.gz

when I get properties of 'Thu_Dec_12_14_39_40_2019.tar.gz' file I see size: 61.6 MB (64,643,687 bytes).

Now I don't know what is my problem and what should I do by this process?
I appreciate it if anybody shares his/her comment with me.

    > sessionInfo()
    R version 3.6.1 (2019-07-05)
    Platform: x86_64-w64-mingw32/x64 (64-bit)
    Running under: Windows 10 x64 (build 17134)

    Matrix products: default

    locale:
    [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
    [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

    attached base packages:
    [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

    other attached packages:
     [1] SummarizedExperiment_1.16.0 DelayedArray_0.12.0         matrixStats_0.55.0          Biobase_2.46.0             
     [5] GenomicRanges_1.38.0        GenomeInfoDb_1.22.0         IRanges_2.20.0              S4Vectors_0.24.0           
     [9] BiocGenerics_0.32.0         TCGAbiolinks_2.14.0         sva_3.34.0                  BiocParallel_1.20.0        
    [13] genefilter_1.68.0           mgcv_1.8-30                 nlme_3.1-141                stringr_1.4.0              
    [17] dplyr_0.8.3                 Hmisc_4.2-0                 ggplot2_3.2.1               Formula_1.2-3              
    [21] survival_2.44-1.1           lattice_0.20-38             impute_1.60.0               cluster_2.1.0              
    [25] class_7.3-15                MASS_7.3-51.4               sqldf_0.4-11                RSQLite_2.1.2              
    [29] gsubfn_0.7                  proto_1.0.0                 WGCNA_1.68                  fastcluster_1.1.25         
    [33] dynamicTreeCut_1.63-1      

    loaded via a namespace (and not attached):
      [1] backports_1.1.5             circlize_0.4.8              aroma.light_3.16.0          BiocFileCache_1.10.0       
      [5] plyr_1.8.4                  selectr_0.4-1               ConsensusClusterPlus_1.50.0 lazyeval_0.2.2             
      [9] splines_3.6.1               robust_0.4-18.1             digest_0.6.23               foreach_1.4.7              
     [13] htmltools_0.4.0             GO.db_3.10.0                magrittr_1.5                checkmate_1.9.4            
     [17] memoise_1.1.0               fit.models_0.5-14           doParallel_1.0.15           limma_3.42.0               
     [21] ComplexHeatmap_2.2.0        Biostrings_2.54.0           readr_1.3.1                 annotate_1.64.0            
     [25] R.utils_2.9.0               askpass_1.1                 prettyunits_1.0.2           colorspace_1.4-1           
     [29] rvest_0.3.4                 ggrepel_0.8.1               blob_1.2.0                  rappdirs_0.3.1             
     [33] rrcov_1.4-7                 xfun_0.10                   jsonlite_1.6                tcltk_3.6.1                
     [37] crayon_1.3.4                RCurl_1.95-4.12             zeallot_0.1.0               zoo_1.8-6                  
     [41] iterators_1.0.12            glue_1.3.1                  survminer_0.4.6             gtable_0.3.0               
     [45] zlibbioc_1.32.0             XVector_0.26.0              GetoptLong_0.1.7            shape_1.4.4                
     [49] DEoptimR_1.0-8              scales_1.0.0                DESeq_1.38.0                mvtnorm_1.0-11             
     [53] edgeR_3.28.0                DBI_1.0.0                   ggthemes_4.2.0              Rcpp_1.0.3                 
     [57] xtable_1.8-4                progress_1.2.2              htmlTable_1.13.2            clue_0.3-57                
     [61] matlab_1.0.2                foreign_0.8-72              bit_1.1-14                  km.ci_0.5-2                
     [65] preprocessCore_1.48.0       htmlwidgets_1.5.1           httr_1.4.1                  RColorBrewer_1.1-2         
     [69] acepack_1.4.1               pkgconfig_2.0.3             XML_3.98-1.20               R.methodsS3_1.7.1          
     [73] nnet_7.3-12                 dbplyr_1.4.2                locfit_1.5-9.1              tidyselect_0.2.5           
     [77] rlang_0.4.2                 AnnotationDbi_1.48.0        munsell_0.5.0               tools_3.6.1                
     [81] downloader_0.4              generics_0.0.2              broom_0.5.2                 knitr_1.25                 
     [85] bit64_0.9-7                 robustbase_0.93-5           survMisc_0.5.5              purrr_0.3.3                
     [89] EDASeq_2.20.0               R.oo_1.23.0                 xml2_1.2.2                  biomaRt_2.42.0             
     [93] compiler_3.6.1              rstudioapi_0.10             curl_4.2                    png_0.1-7                  
     [97] ggsignif_0.6.0              tibble_2.1.3                geneplotter_1.64.0          pcaPP_1.9-73               
    [101] stringi_1.4.3               GenomicFeatures_1.38.0      Matrix_1.2-17               KMsurv_0.1-5               
    [105] vctrs_0.2.0                 lifecycle_0.1.0             pillar_1.4.2                GlobalOptions_0.1.1        
    [109] data.table_1.12.6           bitops_1.0-6                rtracklayer_1.46.0          hwriter_1.3.2              
    [113] R6_2.4.0                    latticeExtra_0.6-28         ShortRead_1.44.0            gridExtra_2.3              
    [117] codetools_0.2-16            assertthat_0.2.1            chron_2.3-54                openssl_1.4.1              
    [121] rjson_0.2.20                withr_2.1.2                 GenomicAlignments_1.22.0    Rsamtools_2.2.0            
    [125] GenomeInfoDbData_1.2.2      hms_0.5.2                   grid_3.6.1                  rpart_4.1-15               
    [129] tidyr_1.0.0
tiagochst commented 4 years ago

I don't have a windows machine to check the problem right now. But it seems the file downloaded was not able to be uncompressed. Did you try using GDCdownload(query,method = "client")?

Also, GDC changed "Primary solid Tumor" to "Primary Tumor".

query <- GDCquery(project = "TCGA-SARC",
                  sample.type = "Primary Tumor",
                data.category = "Transcriptome Profiling",
                data.type = "Gene Expression Quantification",
                workflow.type = "HTSeq - Counts")
GDCdownload(query, files.per.chunk = 50)
data <- GDCprepare(query)

It is working on my side, but it is linux: http://rpubs.com/tiagochst/TCGAbiolinks_376