PapenfussLab / gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite
Other
258 stars 71 forks source link

Error while running gridss_somatic_filter #615

Closed mkyriak closed 1 year ago

mkyriak commented 1 year ago

Hello,

I am trying to use GRIDSS with CHM13v2.0 genome.

I successfully called the variants as follows:

ref=chm13v2.0.fasta
exclude=T2T.excluderanges.bed

gridss \
       -r ${ref} \
       -j ${gridds_dir}/gridss-2.13.2-gridss-jar-with-dependencies.jar \
       -o all_${sample}_calls.vcf \
       -b ${exclude} \
       ${normal} \
       ${tumor} \
       -l nha_p1,${sample}

# create PON dir:
mkdir -p pondir
java -Xmx8g \
       -cp ${gridds_dir}/gridss-2.13.2-gridss-jar-with-dependencies.jar \
       gridss.GeneratePonBedpe \
       $(ls -1 *.vcf.gz | awk ' { print "INPUT=" $0 }' | head -$n) \
       O=pondir/gridss_pon_breakpoint.bedpe \
       SBO=pondir/gridss_pon_single_breakend.bed \
       REFERENCE_SEQUENCE=${ref} NORMAL_ORDINAL=0

However, the following command keeps failing:

# filter the output to keep only the somatic SVs:
gridss_somatic_filter \
        --pondir pondir/ \
        --input all_${sample}_calls.vcf \
        --output ${sample}_high_confidence_somatic.vcf.gz \
        --fulloutput ${sample}_high_and_low_confidence_somatic.vcf.gz \
        --scriptdir $(dirname $(which gridss_somatic_filter)) \
        -n 1 \
        -t 2

Attaching package: ‘Matrix’

The following object is masked from ‘package:S4Vectors’:

    expand

The following objects are masked from ‘package:tidyr’:

    expand, pack, unpack

Attaching package: ‘MatrixGenerics’

The following objects are masked from ‘package:matrixStats’:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars

Test passed 😀
Test passed 🎉
No reference genome supplied using --ref. Not performing variant equivalence checks.
2023-02-16 14:24:34 Reading all_nha_trf2_deltaB_deltaM_3p6_calls.vcf
Tumour samples: nha_trf2_deltaB_deltaM_3p6
Matched normals: nha_p1
Error in `str_detect()`:
! `string` must be a vector, not a <CompressedCharacterList> object.
Backtrace:
    ▆
 1. ├─global align_breakpoints(full_vcf)
 2. │ └─stringr::str_detect(VariantAnnotation::fixed(vcf)$ALT, "[\\]\\[]")
 3. │   └─stringr:::check_lengths(string, pattern)
 4. │     └─vctrs::vec_size_common(...)
 5. └─vctrs:::stop_scalar_type(`<fn>`(`<CmprssCL>`), "string", `<env>`)
 6.   └─vctrs:::stop_vctrs(...)
 7.     └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = vctrs_error_call(call))
Execution halted

I also try to run it as follows in case the piping was causing the problem, but it gave me the same error:

gridss_somatic_filter \
       --pondir pondir/  \
       --input all_${sample}_calls.vcf \
       --output ${sample}_high_confidence_somatic.vcf.gz \
       --fulloutput ${sample}_high_and_low_confidence_somatic.vcf.gz \
       --scriptdir ~/software/gridss/scripts/ \
       -n 1 -t 2

Thank you in advance for your time and help! Maria

mkyriak commented 1 year ago

I forgot to mention that I am using R version 4.0.3

R --version
R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
https://www.gnu.org/licenses/.

All the dependencies were installed successfully in a virtual environment.

(gridss2) [-login1 gridss_run]$ R --version
R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
https://www.gnu.org/licenses/.

(gridss2) [-login1 gridss_run]$ R

R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(pacman)
> pacman::p_load(argparser,tidyverse,stringdist,testthat,stringr,StructuralVariantAnnotation,rtracklayer,BSgenome)      
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /packages/easy-build/software/R/4.0.3/lib64/R/lib/libRblas.so
LAPACK: /packages/easy-build/software/R/4.0.3/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BSgenome_1.58.0                   StructuralVariantAnnotation_1.6.0
 [3] VariantAnnotation_1.36.0          Rsamtools_2.6.0                  
 [5] Biostrings_2.58.0                 XVector_0.30.0                   
 [7] SummarizedExperiment_1.20.0       Biobase_2.50.0                   
 [9] MatrixGenerics_1.2.1              matrixStats_0.63.0               
[11] rtracklayer_1.50.0                GenomicRanges_1.42.0             
[13] GenomeInfoDb_1.26.7               IRanges_2.24.1                   
[15] S4Vectors_0.28.1                  BiocGenerics_0.36.1              
[17] testthat_3.1.6                    stringdist_0.9.10                
[19] forcats_1.0.0                     stringr_1.5.0                    
[21] dplyr_1.1.0                       purrr_1.0.1                      
[23] readr_2.1.2                       tidyr_1.3.0                      
[25] tibble_3.1.8                      ggplot2_3.4.1                    
[27] tidyverse_1.3.2                   argparser_0.7.1                  
[29] pacman_0.5.1                     

loaded via a namespace (and not attached):
 [1] bitops_1.0-7             fs_1.6.1                 lubridate_1.8.0         
 [4] bit64_4.0.5              progress_1.2.2           httr_1.4.4              
 [7] tools_4.0.3              backports_1.4.1          utf8_1.2.3              
[10] R6_2.5.1                 DBI_1.1.3                colorspace_2.1-0        
[13] withr_2.5.0              prettyunits_1.1.1        tidyselect_1.2.0        
[16] curl_5.0.0               bit_4.0.5                compiler_4.0.3          
[19] cli_3.6.0                rvest_1.0.2              xml2_1.3.3              
[22] DelayedArray_0.16.3      scales_1.2.1             askpass_1.1             
[25] rappdirs_0.3.3           pkgconfig_2.0.3          fastmap_1.1.0           
[28] dbplyr_2.3.0             rlang_1.0.6              readxl_1.4.0            
[31] RSQLite_2.2.20           generics_0.1.3           jsonlite_1.8.4          
[34] BiocParallel_1.24.1      googlesheets4_1.0.0      RCurl_1.98-1.10         
[37] magrittr_2.0.3           GenomeInfoDbData_1.2.4   Matrix_1.4-1            
[40] Rcpp_1.0.10              munsell_0.5.0            fansi_1.0.4             
[43] lifecycle_1.0.3          stringi_1.7.12           zlibbioc_1.36.0         
[46] brio_1.1.3               BiocFileCache_1.14.0     grid_4.0.3              
[49] blob_1.2.3               crayon_1.5.2             lattice_0.20-45         
[52] haven_2.5.0              GenomicFeatures_1.42.3   hms_1.1.2               
[55] pillar_1.8.1             biomaRt_2.46.3           reprex_2.0.1            
[58] XML_3.99-0.13            glue_1.6.2               modelr_0.1.8            
[61] vctrs_0.5.2              tzdb_0.3.0               cellranger_1.1.0        
[64] openssl_2.0.5            gtable_0.3.1             assertthat_0.2.1        
[67] cachem_1.0.6             broom_0.8.0              googledrive_2.0.0       
[70] gargle_1.2.0             AnnotationDbi_1.52.0     GenomicAlignments_1.26.0
[73] memoise_2.0.1            ellipsis_0.3.2          
> 
mnshgl0110 commented 1 year ago

I too have encountered the same issue.

imsarath commented 1 year ago

I also encountered the same issue and resolved it by modifying the following line in libgridss.R at 780

isbp = str_detect(as.vector(VariantAnnotation::fixed(vcf)$ALT), "[\\]\\[]")

Hope this could help.

mkyriak commented 1 year ago

Thank you @imsarath, I will give it a try!

mkyriak commented 1 year ago

Thank you @imsarath, it worked!