Open ethering opened 7 months ago
Hi @ethering have you been able to overcome this issue?
I'm observing the same behavior, and this issue is breaking some downstream analysis (liftover).
A simple check with
bcftools norm --check-ref e --fasta-ref <fasta_file> <vcf_file> -o <vcf_norm_file>
Gives errors like the following:
Reference allele mismatch at chr7 .. REF_SEQ:'GCTCTTCTCTTCTT' vs VCF:'CTCTTCTCTTCTTA
Besides the issue with POS
I would like to add that the END
information field seems 1-based to me, which is correct according to the vcf specification, but inconsistent with the 0-based POS
. This brakes the following relation from the vcf specification:
For precise variants, END is POS + length of REF allele − 1, and the for imprecise variants the corresponding best estimate.
A simple example of the issue showing a diff between Reference and produced fasta, and the corresponding vcf:
(The lines in the fasta file are of length = 100)
The deletion should have:
POS = 307 (1 based)
LEN = 14
END = 320 (1 based)
307 + 14 -1 = 320
Hi @ethering have you been able to overcome this issue?
I'm observing the same behavior, and this issue is breaking some downstream analysis (liftover).
A simple check with
bcftools norm --check-ref e --fasta-ref <fasta_file> <vcf_file> -o <vcf_norm_file>
Gives errors like the following:
Reference allele mismatch at chr7 .. REF_SEQ:'GCTCTTCTCTTCTT' vs VCF:'CTCTTCTCTTCTTA
No. I modified my workflow so that it didn't require the VCF file. One alternative would be to create a script that increments the POS position by one.
Hi, I'm using SURVIVOR (v1.0.7) for the first time and trying out some test data. I've generated a params file using
SURVIVOR simSV test_params.param
and then simulated the SVs onto a reference genome.SURVIVOR simSV ref.fasta test_params.param 0.1 0 simulated
Upon viewing the generated
simulated.vcf
file, I see that the VCF file is using zero-based coordinates when VCF uses 1-based coordinates. Here are the first 16 characters of my reference sequence:Here are the first 16 characters of my simulated sequence:
and here are the lines in the VCF file that relate to the variants (the VCF header states 'source=Sniffles').
As you can see, the first variant is at position zero and then the subsequent co-ordinates are also zero-based. The actual 'POS' values should be 1, 4, 10, and 16.
Based on my initial use of SURVIVOR, I have two other comments:
SURVIVOR simSV --help
gives the stdout ofParameter file generated
and produces a file called--help