BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
296 stars 112 forks source link

Error in GDCprepare #110

Closed gattofrancesco closed 2 years ago

gattofrancesco commented 7 years ago

Hi I am encountering this error while issuing GDCprepare:

project <- "TCGA-UVM" query <- GDCquery(project = project, data.category = c("Transcriptome Profiling"), data.type = "Gene Expression Quantification", workflow.type = "HTSeq - Counts", sample.type = c("Primary solid Tumor","Solid Tissue Normal"))

GDCdownload(query,method = "api",chunks.per.download = 10) data <- GDCprepare(query,directory = "GDCdata") |==========================================================================================| 100% Starting to add information to samples => Add clinical information to samples Error: lexical error: invalid char in json text. <?xml version="1.0" ?> <respons (right here) ------^

This is my sessionInfo():

sessionInfo() R version 3.4.0 (2017-04-21) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: OS X El Capitan 10.11.6

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] RJSONIO_1.3-0 rjson_0.2.15 DT_0.2 TCGAbiolinks_2.4.3

loaded via a namespace (and not attached): [1] R.utils_2.5.0 RSQLite_2.0 AnnotationDbi_1.38.1
[4] htmlwidgets_0.8 grid_3.4.0 trimcluster_0.1-2
[7] BiocParallel_1.10.1 DESeq_1.28.0 munsell_0.4.3
[10] codetools_0.2-15 preprocessCore_1.38.1 colorspace_1.3-2
[13] GOSemSim_2.2.0 BiocInstaller_1.26.0 Biobase_2.36.2
[16] knitr_1.16 supraHex_1.14.0 stats4_3.4.0
[19] robustbase_0.92-7 DOSE_3.2.0 pathview_1.16.1
[22] KEGGgraph_1.34.0 GenomeInfoDbData_0.99.0 mnormt_1.5-5
[25] hwriter_1.3.2 KMsurv_0.1-5 bit64_0.9-7
[28] downloader_0.4 c3net_1.1.1 ggthemes_3.4.0
[31] EDASeq_2.10.0 diptest_0.75-7 R6_2.2.2
[34] doParallel_1.0.10 GenomeInfoDb_1.12.2 locfit_1.5-9.1
[37] flexmix_2.3-14 bitops_1.0-6 fgsea_1.2.1
[40] DelayedArray_0.2.7 assertthat_0.2.0 scales_0.4.1
[43] nnet_7.3-12 gtable_0.2.0 affy_1.54.0
[46] rlang_0.1.1 genefilter_1.58.1 cmprsk_2.2-7
[49] GlobalOptions_0.0.12 splines_3.4.0 rtracklayer_1.36.3
[52] lazyeval_0.2.0 hexbin_1.27.1 selectr_0.3-1
[55] broom_0.4.2 yaml_2.1.14 reshape2_1.4.2
[58] GenomicFeatures_1.28.3 qvalue_2.8.0 clusterProfiler_3.4.4
[61] tools_3.4.0 psych_1.7.5 ggplot2_2.2.1
[64] affyio_1.46.0 RColorBrewer_1.1-2 BiocGenerics_0.22.0
[67] Rcpp_0.12.11 plyr_1.8.4 zlibbioc_1.22.0
[70] purrr_0.2.2.2 RCurl_1.95-4.8 ggpubr_0.1.3
[73] GetoptLong_0.1.6 viridis_0.4.0 S4Vectors_0.14.3
[76] zoo_1.8-0 SummarizedExperiment_1.6.3 ggrepel_0.6.5
[79] cluster_2.0.6 magrittr_1.5 data.table_1.10.4
[82] dnet_1.0.10 DO.db_2.9 circlize_0.4.0
[85] survminer_0.4.0 mvtnorm_1.0-6 whisker_0.3-2
[88] matrixStats_0.52.2 aroma.light_3.6.0 hms_0.3
[91] xtable_1.8-2 minet_3.34.0 XML_3.98-1.9
[94] mclust_5.3 IRanges_2.10.2 gridExtra_2.2.1
[97] shape_1.4.2 compiler_3.4.0 biomaRt_2.32.1
[100] tibble_1.3.3 R.oo_1.21.0 htmltools_0.3.6
[103] tidyr_0.6.3 geneplotter_1.54.0 DBI_0.7
[106] matlab_1.0.2 ComplexHeatmap_1.14.0 MASS_7.3-47
[109] fpc_2.1-10 ShortRead_1.34.0 Matrix_1.2-10
[112] readr_1.1.1 parmigene_1.0.2 R.methodsS3_1.7.1
[115] parallel_3.4.0 bindr_0.1 igraph_1.0.1
[118] GenomicRanges_1.28.3 pkgconfig_2.0.1 km.ci_0.5-2
[121] rvcheck_0.0.8 GenomicAlignments_1.12.1 foreign_0.8-69
[124] xml2_1.1.1 foreach_1.4.3 annotate_1.54.0
[127] XVector_0.16.0 rvest_0.3.2 stringr_1.2.0
[130] digest_0.6.12 ConsensusClusterPlus_1.40.0 graph_1.54.0
[133] Biostrings_2.44.1 fastmatch_1.1-0 survMisc_0.5.4
[136] dendextend_1.5.2 edgeR_3.18.1 curl_2.7
[139] kernlab_0.9-25 Rsamtools_1.28.0 modeltools_0.2-21
[142] nlme_3.1-131 jsonlite_1.5 bindrcpp_0.2
[145] viridisLite_0.2.0 limma_3.32.2 lattice_0.20-35
[148] KEGGREST_1.16.0 httr_1.2.1 DEoptimR_1.0-8
[151] survival_2.41-3 GO.db_3.4.1 glue_1.1.1
[154] png_0.1-7 prabclus_2.2-6 iterators_1.0.8
[157] bit_1.1-12 Rgraphviz_2.20.0 class_7.3-14
[160] stringi_1.1.5 blob_1.1.0 latticeExtra_0.6-28
[163] memoise_1.1.0 dplyr_0.7.1 ape_4.1

tiagochst commented 7 years ago

This problem is caused by memory issues, the request is too big. I'm not sure if it is a R library problem or GDC API. I added a exception to reduce the number of requests in case it fails. This should solve the problem.

Please, could you update with the last version and try again? devtools::install_github("BioinformaticsFMRP/TCGAbiolinks")

gattofrancesco commented 7 years ago

Hi Tiago, thanks for fixing this. It is solved. However, it fails to load the function "colData" (I am following the tutorial):

data <- GDCprepare(query,directory = "GDCdata",summarizedExperiment = T) |=============================================================================================================| 100% Starting to add information to samples => Add clinical information to samples => Adding subtype information to samples Downloading genome information (try:0) Using: Human genes (GRCh38.p10) Loading from disk From the 60488 genes we couldn't map 3453

head(data) class: RangedSummarizedExperiment dim: 6 80 metadata(0): assays(1): HTSeq - Counts rownames(6): ENSG00000000003 ENSG00000000005 ... ENSG00000000460 ENSG00000000938 rowData names(3): ensembl_gene_id external_gene_name original_ensembl_gene_id colnames(80): TCGA-V4-A9ET-01A-11R-A405-07 TCGA-V4-A9EL-01A-11R-A405-07 ... TCGA-VD-A8KB-01A-11R-A405-07 TCGA-VD-A8KD-01A-11R-A405-07 colData names(55): patient barcode ... project_id name

datatable(as.data.frame(colData(data)),

  • options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
  • rownames = FALSE) Error in colData(data) : could not find function "colData"
tiagochst commented 7 years ago

This function os from the SummarizedExperiment package. Just call

library(SummarizedExperiment)

And this function will be available.

Em qua, 28 de jun de 2017 17:20, Francesco Gatto notifications@github.com escreveu:

Hi Tiago, thanks for fixing this. It is solved. However, it fails to load the function "colData" (I am following the tutorial):

data <- GDCprepare(query,directory = "GDCdata",summarizedExperiment = T) |=============================================================================================================| 100% Starting to add information to samples => Add clinical information to samples => Adding subtype information to samples Downloading genome information (try:0) Using: Human genes (GRCh38.p10) Loading from disk From the 60488 genes we couldn't map 3453

head(data) class: RangedSummarizedExperiment dim: 6 80 metadata(0): assays(1): HTSeq - Counts rownames(6): ENSG00000000003 ENSG00000000005 ... ENSG00000000460 ENSG00000000938 rowData names(3): ensembl_gene_id external_gene_name original_ensembl_gene_id colnames(80): TCGA-V4-A9ET-01A-11R-A405-07 TCGA-V4-A9EL-01A-11R-A405-07 ... TCGA-VD-A8KB-01A-11R-A405-07 TCGA-VD-A8KD-01A-11R-A405-07 colData names(55): patient barcode ... project_id name

datatable(as.data.frame(colData(data)),

-

     options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),

-

     rownames = FALSE)

Error in colData(data) : could not find function "colData"

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/110#issuecomment-311827597, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI4eRJvCDcqMyMyTZ1FTcxQeWegog0nks5sIu3qgaJpZM4OHGmP .

PanSX-Dr commented 2 years ago

Hi, there is something wrong about GDCprepare: library(TCGAbiolinks) gbm_query <- GDCquery(project = "TCGA-GBM", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "STAR - Counts") GDCdownload(gbm_query) expr_gbm_data <- GDCprepare(query = gbm_query, directory = "GDCdata", summarizedExperiment = T) |==========================================================|100% Completed after 26 s Error in stop_subscript(): ! Can't subset columns that don't exist. x Locations 2, 3, and 4 don't exist. ℹ There are only 1 column. Run rlang::last_error() to see where the error occurred.

my sessionInfo() R version 4.2.0 (2022-04-22 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale: [1] LC_COLLATE=Chinese (Simplified)_China.utf8 [2] LC_CTYPE=Chinese (Simplified)_China.utf8
[3] LC_MONETARY=Chinese (Simplified)_China.utf8 [4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.utf8

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] SummarizedExperiment_1.24.0 Biobase_2.54.0
[3] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
[5] IRanges_2.28.0 S4Vectors_0.32.3
[7] BiocGenerics_0.40.0 MatrixGenerics_1.6.0
[9] matrixStats_0.61.0 TCGAbiolinks_2.22.4

loaded via a namespace (and not attached): [1] httr_1.4.2 tidyr_1.2.0
[3] vroom_1.5.7 bit64_4.0.5
[5] jsonlite_1.8.0 R.utils_2.11.0
[7] assertthat_0.2.1 BiocFileCache_2.2.1
[9] blob_1.2.3 GenomeInfoDbData_1.2.7
[11] progress_1.2.2 pillar_1.7.0
[13] RSQLite_2.2.11 lattice_0.20-45
[15] glue_1.6.2 downloader_0.4
[17] digest_0.6.29 XVector_0.34.0
[19] rvest_1.0.2 colorspace_2.0-3
[21] Matrix_1.4-1 R.oo_1.24.0
[23] plyr_1.8.7 XML_3.99-0.9
[25] pkgconfig_2.0.3 biomaRt_2.50.3
[27] zlibbioc_1.40.0 purrr_0.3.4
[29] scales_1.2.0 tzdb_0.2.0
[31] tibble_3.1.7 KEGGREST_1.34.0
[33] generics_0.1.2 TCGAbiolinksGUI.data_1.14.0 [35] ggplot2_3.3.6 ellipsis_0.3.2
[37] cachem_1.0.6 cli_3.2.0
[39] magrittr_2.0.3 crayon_1.5.1
[41] memoise_2.0.1 R.methodsS3_1.8.1
[43] fansi_1.0.3 xml2_1.3.3
[45] tools_4.2.0 data.table_1.14.2
[47] prettyunits_1.1.1 hms_1.1.1
[49] lifecycle_1.0.1 stringr_1.4.0
[51] munsell_0.5.0 DelayedArray_0.20.0
[53] AnnotationDbi_1.56.2 Biostrings_2.62.0
[55] compiler_4.2.0 rlang_1.0.2
[57] grid_4.2.0 RCurl_1.98-1.6
[59] rstudioapi_0.13 rappdirs_0.3.3
[61] bitops_1.0-7 gtable_0.3.0
[63] DBI_1.1.2 curl_4.3.2
[65] R6_2.5.1 knitr_1.38
[67] dplyr_1.0.8 fastmap_1.1.0
[69] bit_4.0.4 utf8_1.2.2
[71] filelock_1.0.2 readr_2.1.2
[73] stringi_1.7.6 parallel_4.2.0
[75] Rcpp_1.0.8.3 vctrs_0.3.8
[77] png_0.1-7 dbplyr_2.1.1
[79] tidyselect_1.1.2 xfun_0.30

tiagochst commented 2 years ago

@PanSX-Dr https://github.com/PanSX-Dr The version of the package you have is outdated. Cold you please update it:

BiocManager::install("BioinformaticsFMRP/TCGAbiolinksGUI.data")BiocManager::install("BioinformaticsFMRP/TCGAbiolinks")

https://github.com/BioinformaticsFMRP/TCGAbiolinks#installation-from-bioconductor

Message ID: @.*** com>