Open kcleal opened 1 year ago
The script only supports gridss vcfs. These vcfs have one ALT allele per record.
Alternatively, it could be a R/bioconductor version issue on your environment.
On Tue, 25 July 2023, 12:28 am Kez Cleal, @.***> wrote:
Hi,
Ive run in to an error running the somatic filter:
Rscript ./GRIDSS/gridss_somatic_filter --input ERR2752450.gridss.vcf --output gridss_hq_somatic.vcf.gz --scriptdir ./GRIDSS/ No reference genome supplied using --ref. Not performing variant equivalence checks. 2023-07-24 13:50:28 Reading ERR2752450.gridss.vcf Tumour samples: ERR2752450.cram Matched normals: ERR2752449.cram Error in
str_detect()
: !string
must be a vector, not aobject. Backtrace: ▆
- ├─global align_breakpoints(full_vcf)
- │ └─stringr::str_detect(VariantAnnotation::fixed(vcf)$ALT, "[\]\[]")
- │ └─stringr:::check_lengths(string, pattern)
- │ └─vctrs::vec_size_common(...)
- └─vctrs:::stop_scalar_type(
<fn>
(<CmprssCL>
), "string",<env>
)- └─vctrs:::stop_vctrs(...)
- └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call) Execution halted
Any ideas about how to fix this, thanks?
— Reply to this email directly, view it on GitHub https://github.com/PapenfussLab/gridss/issues/635, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOBYOEYGKYWF2II7MJ4HWTXR2BAXANCNFSM6AAAAAA2VVRHIA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks @d-cameron for the quick reply. The vcf was generated by gridss. I will make a new environment and try re-installing, thanks!
Hello, I produced tumor/normal vcfs with gridss and would now like to postprocess with 'gridss_somatic_filter'. I ran into the exact issue/error as above and would need advise what to try next. Any help is greatly appreciated!
------>8---------------------
Test passed 😸
Test passed 🥇
Loading required package: BSgenome
2023-11-01 17:25:03.646492 Reading tumor_vs_normal_all_calls.vcf
Tumour samples: tumor
Matched normals: normal
Error in str_detect()
:
! string
must be a vector, not a
<fn>
(<CmprssCL>
), "string", <env>
)Details: It is a brand new gridss conda environment, installed with 'mamba create -n gridss gridss' This is my command: 'gridss_somatic_filter --input tumor_vs_normal_all_calls.vcf --output test -n 1 --pondir pondir --ref BSgenome.xxx.yyy.zzz -f test-full' I produced the necessary files (gridss_pon_breakpoint.bedpe, gridss_pon_single_breakend.bed) as instructed and provide them in 'pondir'. I am working in a plant and had to build the BSgenomes package myself. I tried to build it with R-library BSgenome version 1.68 in the gridss conda environment, but it fails to build with this error:
... Error in .TwoBits_export(mapply(.DNAString_to_twoBit, object, seqnames), : UCSC library operation failed (very similar error when 'ondisk_seq_format: fa')
It builds fine with Biocoductor BSgenome library version 1.70 on my system R 4.3, and I am using this BSgenomes package (BSgenome.xxx.yyy.zzz).
Update: The bioconductor R-library BSgenome version 1.68 from gridss conda install fails to produce a BSgenome package. It was apparently built (R CMD build) without the --keep-empty-dirs flag, so the necessary directories /inst/extdata/ were missing. Creating them solved the issue. See https://support.bioconductor.org/p/124169/
and I can confirm that my gridss produced vcf has only one REF and one ALT allele per locus. Example entries. Some do contain ".", though.
bcftools query -f '%CHROM %POS %REF %ALT\n' xxx.vcf ------>8---------- chr01 20422694 T T[chr01:20422705[ chr01 20422705 C ]chr01:20422694]C chr01 20509080 A A. chr01 20597157 T .TGAAAAAACAACATCCAGCTATCAGTTCTCAAGAAAAGATAT chr01 20778566 A ]chr23:23317025]A chr01 21198059 G G]chr01:21198094] ------>8----------
Hello,
I have been having the same error as warthmann above. Has there been any solution to this?
A quick fix that worked for me:
Original:
isbp = str_detect(VariantAnnotation::fixed(vcf)$ALT, "[\\]\\[]")
New:
isbp = str_detect(as.character(VariantAnnotation::fixed(vcf)$ALT), "[\\]\\[]")
Then rerun gridss_somatic_filter.
Note: this assumes that the ALT fields contain a single allele per line, which seems to be the case in my GRIDSS output VCF files.
Great! thanks @hberger, your fix worked for me as well. I.e., the script now ran through.
Hi,
Ive run in to an error running the somatic filter:
Any ideas about how to fix this, thanks?