Open ChrisAi89 opened 2 years ago
@ChrisAi89 Please, could you post the query code.
@tiagochst Thnaks for the quick reply. Following my code:
rm(list = ls())
gc()
library("TCGAbiolinks")
################################################################################
# Get Histoslides for COAD for HG38 harmonized data
################################################################################
setwd("/mnt/project-data-0/")
projects <- getGDCprojects()
projects <- projects[,1, drop = TRUE]
for(i in projects){
cat(i,"\n")
query_histo_slides <- try(GDCquery(project = i,
data.category = "Biospecimen", data.type = "Slide Image"),silent = TRUE)
if(class(query_histo_slides) == "try-error"){
cat(i," - Not Slide Images\n")
next
}
else{GDCdownload(query = query_histo_slides, method = "api")
}
rm(query_histo_slides)
}
Hi @tiagochst,
did you manage to reproduce my described code-behavior? Just would like to know whether I have to find a different way or not.
All the best, Chris
Yes I was able to, but I will need to contact GDC. Here is the problematic file: https://portal.gdc.cancer.gov/files/9412af8e-3c00-44de-a29b-b4801b65ca42
On Tue, Sep 6, 2022 at 1:01 PM ChrisAi89 @.***> wrote:
Hi @tiagochst https://github.com/tiagochst,
did you manage to reproduce my described code-behavior? Just would like to know whether I have to find a different way or not.
All the best, Chris
— Reply to this email directly, view it on GitHub https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/534#issuecomment-1238423501, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQ6ME2CBMBIOGGI4UAWDV452FNANCNFSM57ZHDVDA . You are receiving this because you were mentioned.Message ID: @.***>
Dear all,
I am trying to download the tissue slides for the TCGA-READ and TCGA-COAD project, but I am confronted with
/bin/tar: This does not look like a tar archive
. From my point of view it looks like a problem withtar
andgzip
, but it works fine with other projects. As download method, I am usingapi
,client
does not show any data for the TCGA-READ project. Is there a reason for that?Thanks and all the best!
Follwing the output and the session info.
Output: Downloading data for project TCGA-READ Of the 530 files for download 72 already exist. We will download only those that are missing ones. GDCdownload will download 458 files. A total of 121.391636198 GB The total size of files is big. We will download files in chunks Downloading chunk 1 of 153 (3 files, size = 890.805743 MB) as Sat_Aug_27_09_12_54_2022_0.tar.gz |======================================================================| 100% /bin/tar: This does not look like a tar archive
gzip: stdin: not in gzip format /bin/tar: Child returned status 1 /bin/tar: Error is not recoverable: exiting now Download completed At least one of the chunks download was not correct. We will retry Downloading chunk 1 of 153 (3 files, size = 890.805743 MB) as Sat_Aug_27_09_12_54_2022_0.tar.gz |======================================================================| 100% /bin/tar: This does not look like a tar archive
gzip: stdin: not in gzip format /bin/tar: Child returned status 1 /bin/tar: Error is not recoverable: exiting now Download completed Error in if (ret == 1) break : argument is of length zero Calls: GDCdownload ... tryCatchList -> tryCatchOne -> -> GDCdownload.by.chunk
Execution halted
Session Info: R version 4.2.1 (2022-06-23) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.1 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale: [1] C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] TCGAbiolinks_2.24.3
loaded via a namespace (and not attached): [1] Rcpp_1.0.9 lattice_0.20-45
[3] tidyr_1.2.0 prettyunits_1.1.1
[5] png_0.1-7 Biostrings_2.64.1
[7] assertthat_0.2.1 digest_0.6.29
[9] utf8_1.2.2 BiocFileCache_2.4.0
[11] plyr_1.8.7 R6_2.5.1
[13] GenomeInfoDb_1.32.3 stats4_4.2.1
[15] RSQLite_2.2.16 httr_1.4.4
[17] ggplot2_3.3.6 pillar_1.8.1
[19] zlibbioc_1.42.0 rlang_1.0.4
[21] progress_1.2.2 curl_4.3.2
[23] data.table_1.14.2 blob_1.2.3
[25] S4Vectors_0.34.0 Matrix_1.4-1
[27] downloader_0.4 readr_2.1.2
[29] stringr_1.4.1 RCurl_1.98-1.8
[31] bit_4.0.4 biomaRt_2.52.0
[33] munsell_0.5.0 DelayedArray_0.22.0
[35] xfun_0.32 compiler_4.2.1
[37] pkgconfig_2.0.3 BiocGenerics_0.42.0
[39] tidyselect_1.1.2 KEGGREST_1.36.3
[41] SummarizedExperiment_1.26.1 tibble_3.1.8
[43] GenomeInfoDbData_1.2.8 IRanges_2.30.1
[45] matrixStats_0.62.0 XML_3.99-0.10
[47] fansi_1.0.3 crayon_1.5.1
[49] dplyr_1.0.9 tzdb_0.3.0
[51] dbplyr_2.2.1 rappdirs_0.3.3
[53] bitops_1.0-7 grid_4.2.1
[55] jsonlite_1.8.0 gtable_0.3.0
[57] lifecycle_1.0.1 DBI_1.1.3
[59] magrittr_2.0.3 scales_1.2.1
[61] cli_3.3.0 TCGAbiolinksGUI.data_1.16.0 [63] stringi_1.7.8 cachem_1.0.6
[65] XVector_0.36.0 xml2_1.3.3
[67] filelock_1.0.2 ellipsis_0.3.2
[69] generics_0.1.3 vctrs_0.4.1
[71] tools_4.2.1 bit64_4.0.5
[73] Biobase_2.56.0 glue_1.6.2
[75] purrr_0.3.4 hms_1.1.2
[77] MatrixGenerics_1.8.1 fastmap_1.1.0
[79] AnnotationDbi_1.58.0 colorspace_2.0-3
[81] GenomicRanges_1.48.0 rvest_1.0.3
[83] memoise_2.0.1 knitr_1.40