BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
284 stars 109 forks source link

GDCprepare Error adding clinical information to samples #629

Open IvanEllson opened 1 month ago

IvanEllson commented 1 month ago

Hello,

I have encontered an error using GDCprepare with gene expression data from BEATAML1.0-COHORT, while trying to add clinical information to samples:

Error in dplyr::bind_cols(): ! Can't recycle ..1 (size 0) to match ..2 (size 2).

The error appears with the last TCGAbiolinks versions (2.32.0 and 2.31.4), but not with older versions like 2.28.4. The error can be found running the following code (including rlang::last_trace() output):

> library(TCGAbiolinks)
> query_BEATAML1.0COHORT <- GDCquery(project = "BEATAML1.0-COHORT",
+                                    data.category = "Transcriptome Profiling",
+                                    data.type = "Gene Expression Quantification",
+                                    data.format = "tsv")
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: BEATAML1.0-COHORT
--------------------
oo Filtering results
--------------------
ooo By data.format
ooo By data.type
----------------
oo Checking data
----------------
ooo Checking if there are duplicated cases
ooo Checking if there are results for the query
-------------------
o Preparing output
-------------------
> 
> GDCdownload(query_BEATAML1.0COHORT)
Downloading data for project BEATAML1.0-COHORT
Of the 735 files for download 735 already exist.
All samples have been already downloaded
> 
> data_BEATAML1.0COHORT <- GDCprepare(query_BEATAML1.0COHORT)
|==================================================================================================================================================|100%                      Completed after 6 s 
Starting to add information to samples
 => Add clinical information to samples
Error in `dplyr::bind_cols()`:
! Can't recycle `..1` (size 0) to match `..2` (size 2).
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/vctrs_error_incompatible_size>
Error in `dplyr::bind_cols()`:
! Can't recycle `..1` (size 0) to match `..2` (size 2).
---
Backtrace:
     ▆
  1. └─TCGAbiolinks::GDCprepare(query_BEATAML1.0COHORT)
  2.   └─TCGAbiolinks:::readTranscriptomeProfiling(...)
  3.     └─TCGAbiolinks:::makeSEfromTranscriptomeProfilingSTAR(...)
  4.       └─TCGAbiolinks::colDataPrepare(cases)
  5.         └─TCGAbiolinks:::splitAPICall(...)
  6.           └─base::tryCatch(...)
  7.             └─base (local) tryCatchList(expr, classes, parentenv, handlers)
  8.               └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
  9.                 └─value[[3L]](cond)
 10.                   └─TCGAbiolinks (local) FUN(items[start:end])
 11.                     └─dplyr::bind_cols(df %>% as.data.frame, diagnoses %>% as.data.frame)
Run rlang::last_trace(drop = FALSE) to see 5 hidden frames.

And my R session info:

> sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Madrid
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TCGAbiolinks_2.32.0

loaded via a namespace (and not attached):
 [1] writexl_1.5.0               tidyselect_1.2.1            dplyr_1.1.4                 blob_1.2.4                  filelock_1.0.3              Biostrings_2.72.0          
 [7] fastmap_1.2.0               BiocFileCache_2.12.0        XML_3.99-0.16.1             digest_0.6.35               lifecycle_1.0.4             KEGGREST_1.44.0            
[13] RSQLite_2.3.6               magrittr_2.0.3              compiler_4.4.0              rlang_1.1.3                 progress_1.2.3              tools_4.4.0                
[19] utf8_1.2.4                  data.table_1.15.4           knitr_1.46                  prettyunits_1.2.0           S4Arrays_1.4.0              bit_4.0.5                  
[25] curl_5.2.1                  DelayedArray_0.30.1         plyr_1.8.9                  xml2_1.3.6                  librarian_1.8.1             abind_1.4-5                
[31] withr_3.0.0                 purrr_1.0.2                 BiocGenerics_0.50.0         grid_4.4.0                  stats4_4.4.0                fansi_1.0.6                
[37] colorspace_2.1-0            ggplot2_3.5.1               scales_1.3.0                biomaRt_2.60.0              SummarizedExperiment_1.34.0 cli_3.6.2                  
[43] crayon_1.5.2                generics_0.1.3              rstudioapi_0.16.0           bcellViper_1.40.0           httr_1.4.7                  tzdb_0.4.0                 
[49] DBI_1.2.2                   cachem_1.0.8                stringr_1.5.1               zlibbioc_1.50.0             rvest_1.0.4                 AnnotationDbi_1.66.0       
[55] TCGAbiolinksGUI.data_1.24.0 BiocManager_1.30.23         XVector_0.44.0              matrixStats_1.3.0           vctrs_0.6.5                 Matrix_1.7-0               
[61] jsonlite_1.8.8              IRanges_2.38.0              hms_1.1.3                   S4Vectors_0.42.0            bit64_4.0.5                 ggrepel_0.9.5              
[67] tidyr_1.3.1                 glue_1.7.0                  stringi_1.8.4               gtable_0.3.5                GenomeInfoDb_1.40.0         GenomicRanges_1.56.0       
[73] UCSC.utils_1.0.0            munsell_0.5.1               tibble_3.2.1                pillar_1.9.0                rappdirs_0.3.3              GenomeInfoDbData_1.2.12    
[79] R6_2.5.1                    dbplyr_2.5.0                httr2_1.0.1                 lattice_0.22-6              Biobase_2.64.0              readr_2.1.5                
[85] png_0.1-8                   memoise_2.0.1               dorothea_1.16.0             Rcpp_1.0.12                 SparseArray_1.4.3           xfun_0.44                  
[91] downloader_0.4              MatrixGenerics_1.16.0       pkgconfig_2.0.3            

Thank you