GoekeLab / bambu

Reference-guided transcript discovery and quantification for long read RNA-Seq data
GNU General Public License v3.0
190 stars 24 forks source link

Error in xgb.get.handle(object): 'xgb.Booster' object is corrupted or is from an incompatible xgboost version. #447

Open nick-youngblut opened 2 months ago

nick-youngblut commented 2 months ago

My command:

bamba_ret = bambu(
    reads = bam_file, 
    annotations = gtf_file, 
    genome = fna_file, 
    quant = FALSE,
    ncore = 8
)

The error:

Error: BiocParallel errors
  1 remote errors, element index: 1
  0 unevaluated and other errors
  first remote error:
Error in xgb.get.handle(object): 'xgb.Booster' object is corrupted or is from an incompatible xgboost version.

Traceback:

1. bambu(reads = bam_file, annotations = gtf_file, genome = fna_file, 
 .     quant = FALSE, ncore = 8)
2. bambu.processReads(reads, annotations, genomeSequence = genome, 
 .     readClass.outputDir = rcOutDir, yieldSize, bpParameters, 
 .     stranded, verbose, isoreParameters, trackReads = trackReads, 
 .     fusionMode = fusionMode, lowMemory = lowMemory)
3. bplapply(names(reads), function(bamFileName) {
 .     bambu.processReadsByFile(bam.file = reads[bamFileName], genomeSequence = genomeSequence, 
 .         annotations = annotations, readClass.outputDir = readClass.outputDir, 
 .         stranded = stranded, min.readCount = min.readCount, fitReadClassModel = fitReadClassModel, 
 .         min.exonOverlap = min.exonOverlap, defaultModels = defaultModels, 
 .         returnModel = returnModel, verbose = verbose, lowMemory = lowMemory, 
 .         trackReads = trackReads, fusionMode = fusionMode)
 . }, BPPARAM = bpParameters)
4. bplapply(names(reads), function(bamFileName) {
 .     bambu.processReadsByFile(bam.file = reads[bamFileName], genomeSequence = genomeSequence, 
 .         annotations = annotations, readClass.outputDir = readClass.outputDir, 
 .         stranded = stranded, min.readCount = min.readCount, fitReadClassModel = fitReadClassModel, 
 .         min.exonOverlap = min.exonOverlap, defaultModels = defaultModels, 
 .         returnModel = returnModel, verbose = verbose, lowMemory = lowMemory, 
 .         trackReads = trackReads, fusionMode = fusionMode)
 . }, BPPARAM = bpParameters)
5. .bpinit(manager = manager, X = X, FUN = FUN, ARGS = ARGS, BPPARAM = BPPARAM, 
 .     BPOPTIONS = BPOPTIONS, BPREDO = BPREDO)

I've tried restarting the R kernel, but that did not help. It appears that the xgboost version that I've installed (xgboost_2.1.1.1) is not compatible with the model utilized by default by bambu.

I don't see any version specifications for xgboost in the README or elsewhere. Which versions of xgboost are compatible with the default model?

sessionInfo

R version 4.3.3 (2024-02-29)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS/LAPACK: /home/nickyoungblut/miniforge3/envs/ont_10x/lib/libopenblasp-r0.3.27.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] bambu_3.4.0                 BSgenome_1.70.1            
 [3] rtracklayer_1.62.0          BiocIO_1.12.0              
 [5] Biostrings_2.70.1           XVector_0.42.0             
 [7] SummarizedExperiment_1.32.0 Biobase_2.62.0             
 [9] GenomicRanges_1.54.1        GenomeInfoDb_1.38.1        
[11] IRanges_2.36.0              S4Vectors_0.40.2           
[13] BiocGenerics_0.48.1         MatrixGenerics_1.14.0      
[15] matrixStats_1.3.0           ArcRUtils_0.1.0            
[17] ggplot2_3.5.1               tidyr_1.3.1                
[19] dplyr_1.1.4                

loaded via a namespace (and not attached):
 [1] DBI_1.2.3                bitops_1.0-7             biomaRt_2.58.0          
 [4] rlang_1.1.3              magrittr_2.0.3           compiler_4.3.3          
 [7] RSQLite_2.3.7            GenomicFeatures_1.54.1   png_0.1-8               
[10] vctrs_0.6.5              stringr_1.5.1            pkgconfig_2.0.3         
[13] crayon_1.5.2             fastmap_1.1.1            dbplyr_2.5.0            
[16] utf8_1.2.4               Rsamtools_2.18.0         purrr_1.0.2             
[19] bit_4.0.5                zlibbioc_1.48.0          cachem_1.0.8            
[22] jsonlite_1.8.8           progress_1.2.3           blob_1.2.4              
[25] DelayedArray_0.28.0      uuid_1.2-0               BiocParallel_1.36.0     
[28] parallel_4.3.3           prettyunits_1.2.0        R6_2.5.1                
[31] stringi_1.8.4            xgboost_2.1.1.1          Rcpp_1.0.12             
[34] IRkernel_1.3.2           base64enc_0.1-3          Matrix_1.6-5            
[37] tidyselect_1.2.1         abind_1.4-5              yaml_2.3.8              
[40] codetools_0.2-20         curl_5.1.0               lattice_0.22-6          
[43] tibble_3.2.1             withr_3.0.0              KEGGREST_1.42.0         
[46] evaluate_0.23            BiocFileCache_2.10.1     xml2_1.3.6              
[49] BiocManager_1.30.23      pillar_1.9.0             filelock_1.0.3          
[52] generics_0.1.3           RCurl_1.98-1.14          IRdisplay_1.1           
[55] hms_1.1.3                munsell_0.5.1            scales_1.3.0            
[58] glue_1.7.0               tools_4.3.3              data.table_1.15.2       
[61] GenomicAlignments_1.38.0 pbdZMQ_0.3-11            XML_3.99-0.16.1         
[64] grid_4.3.3               AnnotationDbi_1.64.1     colorspace_2.1-0        
[67] GenomeInfoDbData_1.2.11  repr_1.1.7               restfulr_0.0.15         
[70] cli_3.6.2                rappdirs_0.3.3           fansi_1.0.6             
[73] S4Arrays_1.2.0           gtable_0.3.5             digest_0.6.35           
[76] SparseArray_1.2.2        rjson_0.2.21             memoise_2.0.1           
[79] htmltools_0.5.8.1        lifecycle_1.0.4          httr_1.4.7              
[82] bit64_4.0.5             
nick-youngblut commented 2 months ago

I see that this issue might be addressed with https://github.com/GoekeLab/bambu/pull/386.

In that PR, I don't see any info on the original and updated versions. Which version of xgboost was used to train the default model, and which was used to create the updated models for the PR?

andredsim commented 2 months ago

Thanks for reporting this.

Did you install Bambu via BioConductor or Github? Was your installation of xgboost separate or part of the dependency installation? If possible I would recommend updating to R >4.4 and reinstalling Bambu to the latest version 3.5.1 using Bioconductor which should install a compatible version of xgboost (v1.7.8.1 from what I just tested). It is good to know that the latest version of xgboost is causing this error, so we will need to be ready once the compatible version is no longer compatible with R.

Let me know if updating R/Bambu/downgrading xgboost solves your issue.

Kind Regards, Andre Sim

nick-youngblut commented 2 months ago

Thanks for the suggestion.

Do you constrain the supported versions of xgboost in the package setup? I don't see any such constraints on xgboost in the DESCRIPTION.

If there are no version constraints, why would a re-install of bambu install anything other than the latest version of xgboost, which is now in v2, while you are testing on v1.7 ("v1.7.8.1 from what I just tested")?

andredsim commented 2 months ago

The cran version of xgboost is still v1.7.8 (https://cran.rstudio.com/web/packages/xgboost/index.html) which is what bioconductor will install as a dependency I believe. To confirm if thats correct , may I ask if you installed xgboost independantly or via bioconductor? Presumably in the next bioconductor release xgboosts version will incrase to v2 so we will make sure to either constrain the version or make sure its compatible thanks to your report.

nick-youngblut commented 2 months ago

The lastest conda-forge version of xgboost is 2.1.1.

I used the bambu Bioconda recipe, which does not specify particular xgboost versions.

It might be best to contact the authors of this bioconda recipe:

Screenshot 2024-09-23 at 8 29 38 AM

cying111 commented 2 months ago

Hi @nick-youngblut ,

thanks for reporting this.

However, I believe there must be some confusion here. As you know, xgboost can refer to both the software or the R package. For bambu, the xgboost version is always referring to that of the R xgboost package, so if you check the latest version of that on cran, it is v1.7.8.1, which is the same version that is in bioconductor, see attached here: image

Hope this clarifies your question! Thanks, Ying