BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
298 stars 112 forks source link

GDCprepare Error adding clinical information to samples #629

Open IvanEllson opened 6 months ago

IvanEllson commented 6 months ago

Hello,

I have encontered an error using GDCprepare with gene expression data from BEATAML1.0-COHORT, while trying to add clinical information to samples:

Error in dplyr::bind_cols(): ! Can't recycle ..1 (size 0) to match ..2 (size 2).

The error appears with the last TCGAbiolinks versions (2.32.0 and 2.31.4), but not with older versions like 2.28.4. The error can be found running the following code (including rlang::last_trace() output):

> library(TCGAbiolinks)
> query_BEATAML1.0COHORT <- GDCquery(project = "BEATAML1.0-COHORT",
+                                    data.category = "Transcriptome Profiling",
+                                    data.type = "Gene Expression Quantification",
+                                    data.format = "tsv")
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: BEATAML1.0-COHORT
--------------------
oo Filtering results
--------------------
ooo By data.format
ooo By data.type
----------------
oo Checking data
----------------
ooo Checking if there are duplicated cases
ooo Checking if there are results for the query
-------------------
o Preparing output
-------------------
> 
> GDCdownload(query_BEATAML1.0COHORT)
Downloading data for project BEATAML1.0-COHORT
Of the 735 files for download 735 already exist.
All samples have been already downloaded
> 
> data_BEATAML1.0COHORT <- GDCprepare(query_BEATAML1.0COHORT)
|==================================================================================================================================================|100%                      Completed after 6 s 
Starting to add information to samples
 => Add clinical information to samples
Error in `dplyr::bind_cols()`:
! Can't recycle `..1` (size 0) to match `..2` (size 2).
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/vctrs_error_incompatible_size>
Error in `dplyr::bind_cols()`:
! Can't recycle `..1` (size 0) to match `..2` (size 2).
---
Backtrace:
     ▆
  1. └─TCGAbiolinks::GDCprepare(query_BEATAML1.0COHORT)
  2.   └─TCGAbiolinks:::readTranscriptomeProfiling(...)
  3.     └─TCGAbiolinks:::makeSEfromTranscriptomeProfilingSTAR(...)
  4.       └─TCGAbiolinks::colDataPrepare(cases)
  5.         └─TCGAbiolinks:::splitAPICall(...)
  6.           └─base::tryCatch(...)
  7.             └─base (local) tryCatchList(expr, classes, parentenv, handlers)
  8.               └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
  9.                 └─value[[3L]](cond)
 10.                   └─TCGAbiolinks (local) FUN(items[start:end])
 11.                     └─dplyr::bind_cols(df %>% as.data.frame, diagnoses %>% as.data.frame)
Run rlang::last_trace(drop = FALSE) to see 5 hidden frames.

And my R session info:

> sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Madrid
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TCGAbiolinks_2.32.0

loaded via a namespace (and not attached):
 [1] writexl_1.5.0               tidyselect_1.2.1            dplyr_1.1.4                 blob_1.2.4                  filelock_1.0.3              Biostrings_2.72.0          
 [7] fastmap_1.2.0               BiocFileCache_2.12.0        XML_3.99-0.16.1             digest_0.6.35               lifecycle_1.0.4             KEGGREST_1.44.0            
[13] RSQLite_2.3.6               magrittr_2.0.3              compiler_4.4.0              rlang_1.1.3                 progress_1.2.3              tools_4.4.0                
[19] utf8_1.2.4                  data.table_1.15.4           knitr_1.46                  prettyunits_1.2.0           S4Arrays_1.4.0              bit_4.0.5                  
[25] curl_5.2.1                  DelayedArray_0.30.1         plyr_1.8.9                  xml2_1.3.6                  librarian_1.8.1             abind_1.4-5                
[31] withr_3.0.0                 purrr_1.0.2                 BiocGenerics_0.50.0         grid_4.4.0                  stats4_4.4.0                fansi_1.0.6                
[37] colorspace_2.1-0            ggplot2_3.5.1               scales_1.3.0                biomaRt_2.60.0              SummarizedExperiment_1.34.0 cli_3.6.2                  
[43] crayon_1.5.2                generics_0.1.3              rstudioapi_0.16.0           bcellViper_1.40.0           httr_1.4.7                  tzdb_0.4.0                 
[49] DBI_1.2.2                   cachem_1.0.8                stringr_1.5.1               zlibbioc_1.50.0             rvest_1.0.4                 AnnotationDbi_1.66.0       
[55] TCGAbiolinksGUI.data_1.24.0 BiocManager_1.30.23         XVector_0.44.0              matrixStats_1.3.0           vctrs_0.6.5                 Matrix_1.7-0               
[61] jsonlite_1.8.8              IRanges_2.38.0              hms_1.1.3                   S4Vectors_0.42.0            bit64_4.0.5                 ggrepel_0.9.5              
[67] tidyr_1.3.1                 glue_1.7.0                  stringi_1.8.4               gtable_0.3.5                GenomeInfoDb_1.40.0         GenomicRanges_1.56.0       
[73] UCSC.utils_1.0.0            munsell_0.5.1               tibble_3.2.1                pillar_1.9.0                rappdirs_0.3.3              GenomeInfoDbData_1.2.12    
[79] R6_2.5.1                    dbplyr_2.5.0                httr2_1.0.1                 lattice_0.22-6              Biobase_2.64.0              readr_2.1.5                
[85] png_0.1-8                   memoise_2.0.1               dorothea_1.16.0             Rcpp_1.0.12                 SparseArray_1.4.3           xfun_0.44                  
[91] downloader_0.4              MatrixGenerics_1.16.0       pkgconfig_2.0.3            

Thank you

kellentjioe commented 4 months ago

I have the same issue! Did you find out a way to solve it? Thanks!

warbol commented 3 months ago

Issue is that BEATAML barcodes start with "aq-" which the code is not prepared to handle.

Added fix https://github.com/BioinformaticsFMRP/TCGAbiolinks/pull/634

ChristianRohde commented 2 months ago

Hi @warbol I have exactly that issue and want to use TCGAbiolinks including your updated function. Do you have an advice how I can to this? Can I install a package version from a development branch? Thank you, Christian

warbol commented 2 months ago

@ChristianRohde

library(remotes) remotes::install_github(repo="BioinformaticsFMRP/TCGAbiolinks", ref = remotes::github_pull(634))

will install the updated package from the GitHub pull request #634

kellentjioe commented 1 month ago

Hi @warbol Thank you! I tried to download the updated package but I got this error below. Could you assist me with that?

install_github('BioinformaticsFMRP/TCGAbiolinks', 'remotes::github_pull(634)') Downloading GitHub repo BioinformaticsFMRP/TCGAbiolinks@remotes::github_pull(634) Error in utils::download.file(url, path, method = method, quiet = quiet, : cannot open URL 'https://api.github.com/repos/BioinformaticsFMRP/TCGAbiolinks/tarball/remotes%3A%3Agithub_pull%28634%29'

Thank you!

warbol commented 1 month ago

You can try using devtools version of install_github instead of remotes? library(devtools) devtools::install_github(repo="BioinformaticsFMRP/TCGAbiolinks", ref = devtools::github_pull(634))

I just verified it works so it may be a network issue on your end.