PapenfussLab / gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite
Other
258 stars 71 forks source link

Error when running gridss_somatic_filter #635

Open kcleal opened 1 year ago

kcleal commented 1 year ago

Hi,

Ive run in to an error running the somatic filter:

Rscript ./GRIDSS/gridss_somatic_filter --input ERR2752450.gridss.vcf --output gridss_hq_somatic.vcf.gz --scriptdir ./GRIDSS/
No reference genome supplied using --ref. Not performing variant equivalence checks.
2023-07-24 13:50:28 Reading ERR2752450.gridss.vcf
Tumour samples: ERR2752450.cram
Matched normals: ERR2752449.cram
Error in `str_detect()`:
! `string` must be a vector, not a <CompressedCharacterList> object.
Backtrace:
    ▆
 1. ├─global align_breakpoints(full_vcf)
 2. │ └─stringr::str_detect(VariantAnnotation::fixed(vcf)$ALT, "[\\]\\[]")
 3. │   └─stringr:::check_lengths(string, pattern)
 4. │     └─vctrs::vec_size_common(...)
 5. └─vctrs:::stop_scalar_type(`<fn>`(`<CmprssCL>`), "string", `<env>`)
 6.   └─vctrs:::stop_vctrs(...)
 7.     └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)
Execution halted

Any ideas about how to fix this, thanks?

d-cameron commented 1 year ago

The script only supports gridss vcfs. These vcfs have one ALT allele per record.

Alternatively, it could be a R/bioconductor version issue on your environment.

On Tue, 25 July 2023, 12:28 am Kez Cleal, @.***> wrote:

Hi,

Ive run in to an error running the somatic filter:

Rscript ./GRIDSS/gridss_somatic_filter --input ERR2752450.gridss.vcf --output gridss_hq_somatic.vcf.gz --scriptdir ./GRIDSS/ No reference genome supplied using --ref. Not performing variant equivalence checks. 2023-07-24 13:50:28 Reading ERR2752450.gridss.vcf Tumour samples: ERR2752450.cram Matched normals: ERR2752449.cram Error in str_detect(): ! string must be a vector, not a object. Backtrace: ▆

  1. ├─global align_breakpoints(full_vcf)
  2. │ └─stringr::str_detect(VariantAnnotation::fixed(vcf)$ALT, "[\]\[]")
  3. │ └─stringr:::check_lengths(string, pattern)
  4. │ └─vctrs::vec_size_common(...)
  5. └─vctrs:::stop_scalar_type(<fn>(<CmprssCL>), "string", <env>)
  6. └─vctrs:::stop_vctrs(...)
  7. └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call) Execution halted

Any ideas about how to fix this, thanks?

— Reply to this email directly, view it on GitHub https://github.com/PapenfussLab/gridss/issues/635, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOBYOEYGKYWF2II7MJ4HWTXR2BAXANCNFSM6AAAAAA2VVRHIA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

kcleal commented 1 year ago

Thanks @d-cameron for the quick reply. The vcf was generated by gridss. I will make a new environment and try re-installing, thanks!

warthmann commented 1 year ago

Hello, I produced tumor/normal vcfs with gridss and would now like to postprocess with 'gridss_somatic_filter'. I ran into the exact issue/error as above and would need advise what to try next. Any help is greatly appreciated!

------>8--------------------- Test passed 😸 Test passed 🥇 Loading required package: BSgenome 2023-11-01 17:25:03.646492 Reading tumor_vs_normal_all_calls.vcf Tumour samples: tumor Matched normals: normal Error in str_detect(): ! string must be a vector, not a object. Backtrace: ▆

  1. ├─global align_breakpoints(full_vcf)
  2. │ └─stringr::str_detect(VariantAnnotation::fixed(vcf)$ALT, "[\]\[]")
  3. │ └─stringr:::check_lengths(string, pattern)
  4. │ └─vctrs::vec_size_common(...)
  5. └─vctrs:::stop_scalar_type(<fn>(<CmprssCL>), "string", <env>)
  6. └─vctrs:::stop_vctrs(...)
  7. └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call) Execution halted ------------------>8-----------------------

Details: It is a brand new gridss conda environment, installed with 'mamba create -n gridss gridss' This is my command: 'gridss_somatic_filter --input tumor_vs_normal_all_calls.vcf --output test -n 1 --pondir pondir --ref BSgenome.xxx.yyy.zzz -f test-full' I produced the necessary files (gridss_pon_breakpoint.bedpe, gridss_pon_single_breakend.bed) as instructed and provide them in 'pondir'. I am working in a plant and had to build the BSgenomes package myself. I tried to build it with R-library BSgenome version 1.68 in the gridss conda environment, but it fails to build with this error:

... Error in .TwoBits_export(mapply(.DNAString_to_twoBit, object, seqnames), : UCSC library operation failed (very similar error when 'ondisk_seq_format: fa')

It builds fine with Biocoductor BSgenome library version 1.70 on my system R 4.3, and I am using this BSgenomes package (BSgenome.xxx.yyy.zzz).

warthmann commented 1 year ago

Update: The bioconductor R-library BSgenome version 1.68 from gridss conda install fails to produce a BSgenome package. It was apparently built (R CMD build) without the --keep-empty-dirs flag, so the necessary directories /inst/extdata/ were missing. Creating them solved the issue. See https://support.bioconductor.org/p/124169/

warthmann commented 1 year ago

and I can confirm that my gridss produced vcf has only one REF and one ALT allele per locus. Example entries. Some do contain ".", though.

bcftools query -f '%CHROM %POS %REF %ALT\n' xxx.vcf ------>8---------- chr01 20422694 T T[chr01:20422705[ chr01 20422705 C ]chr01:20422694]C chr01 20509080 A A. chr01 20597157 T .TGAAAAAACAACATCCAGCTATCAGTTCTCAAGAAAAGATAT chr01 20778566 A ]chr23:23317025]A chr01 21198059 G G]chr01:21198094] ------>8----------

wphillips13 commented 1 year ago

Hello,

I have been having the same error as warthmann above. Has there been any solution to this?

hberger commented 1 year ago

A quick fix that worked for me:

Original:

  isbp = str_detect(VariantAnnotation::fixed(vcf)$ALT, "[\\]\\[]")

New:

  isbp = str_detect(as.character(VariantAnnotation::fixed(vcf)$ALT), "[\\]\\[]")  

Then rerun gridss_somatic_filter.

Note: this assumes that the ALT fields contain a single allele per line, which seems to be the case in my GRIDSS output VCF files.

warthmann commented 1 year ago

Great! thanks @hberger, your fix worked for me as well. I.e., the script now ran through.