Open DexinYang1998 opened 4 months ago
Hello, In severus, we only provide a sequence for insertions (in the ALT column). For deletion and duplications, we provide the reference position instead.
Best Ayse
Hello,
When I ran quay/biocontainers/severus:0.1.1--pyhdfd78af_0, I also observed an insertion with a REF field as N
, although the real underlying reference sequence is A
(manually checked).
chr4 40402017 SEVERUS_INS11564 N chr4:40402017-40402312 60 PASS PRECISE;SVTYPE=INS;SVLEN=103;CHR2=chr4;DETAILED_TYPE=.;INSLEN=0;MAPQ=60;SUPPREAD=2;HVAF=0.00|0.00|0.18;CLUSTERID=severus_0;PctSeqSimilarity=0;PctSizeSimilarity=0.5754;PctRecOverlap=0;SizeDiff=76;StartDistance=160;EndDistance=160;GTMatch=1;TruScore=19;MatchId=81588.1.0 GT:GQ:VAF:DR:DV 0/0:242:0.09:21:2
I am having trouble running version 1.1 at the moment due to a spike in memory usage.
I just want to flag this. Hope it will be resolved in the future release.
Ideally, as illustrated in the VCF specification (version 4.5) section 5.7 page 38, the reference sequence is written as it is. Ref: https://github.com/samtools/hts-specs/blob/master/VCFv4.5.pdf
Many thanks, Min
Hello @minw2828,
We are currently not providing a reference sequence, so we keep the REF column as N
as it is widely used in SV vcfs. If you are using truvari with severus output, I suggest running it with --pctseq 0
. Alternatively, we developed minda for benchmarking and vcf merging for cancer genomes.
For the memory usage issue with v1.1, can you please share the command line and the severus log file?
Best Ayse
Hi,
Thanks for developing such a wonderful tool for long-read analysis. Recently, I performed Severus to call somatic mutations between paired tumor-normal samples using the command
--target-bam COLO829_merge_haplotagged.bam --control-bam ../colo829bl/COLO829BL_merge_haplotagged.bam --out-dir severus_out -t 80 --phasing-vcf /data/dyang11/COLO829/phasing2/HiC_ONT_haplotype.phased.VCF --vntr-bed /data/dyang11/index/repeats/chm13_VNTR.bed --use-supplementary-tag --vaf-thr 0.05 --min-support 5 --output-read-ids
.The command indeed works but the reference sequence of the output vcf file is N-masked. For example:
Can you help me figure it out? Thank you very much!