Open tabwalsh opened 6 years ago
I'm seeing the same problem. There is an off-by-one error in the position numbers in the VCF file. For example, the first snp should be at position 27085. not 27086.
Thank you for figuring that out @stevendavis ! It will be on my to do list.......
I think there was/is a bug in treetoreads
where the VCF and the CSV don't agree.
@willpitchers also found a VCF wrapping bug.
There appears to be a data mismatch in the Salmonella enterica 1203NYJAP-1 simulated dataset, between the reference alleles at variant positions reported in the VCF on the one hand, and the corresponding bases (or their positions) in the reference sequence on the other.
For example, the first record in the VCF reports a reference allele
T
in the first contig at position 27086, but this position contains aG
in the reference:However, there is a
T
in the base position immediately before this:This seems to be the case for every variant in this dataset.