Closed imdanique closed 2 months ago
Hi imdanique,
Thank you for using our software.
Based on the error message you received, it appears that the input test VCF file "test.22.vcf.gz" does not comply with our software's format requirements. Could you please share the header and the first several variants of that file with us for further diagnosis?
One more question: why are there two "chr22_mq25_mapab100_chr_norm.vcf" present in your input files?
Thank you,
Kai Yuan
@yorkklause Thank you for the quick reply!
Attached is the truncated version of the vcf file that I've been using test2.vcf.gz
Regarding two "chr22_mq25_mapab100_chr_norm.vcf" files, these are Neanderthal and Denisovan files downloaded from http://cdna.eva.mpg.de/neandertal/Vindija/VCF/. Initially, I tried to use the vcfs that were recommended by AS2 manual (http://cdna.eva.mpg.de/neandertal/altai/), however they didn't work and I switched to Vindija vcfs. Not sure why the initial files didn't work with the tool but new files seem ok
Hi imdanique,
Could you change "chr22" to "22" in your test2.vcf.gz file and try again? It seems AS2 is treating "chr22" and "22" as two different chromosomes.
Thank you
Kai Yuan
Hi @yorkklause,
I tried both with "chr22" and with "22". When I run vcfs with "22", I get
Loading SNP information from test_nochr.vcf
ArchaicSeeker2: data.cpp:500: void genome::vcfsLoad(const std::vector<std::__cxx11::basic_string<char> >&): Assertion `tpos > prePos' failed.
This error occur at the exact vcf that have "22". So if my first two vcfs have "chr22" and third vcf has "22", third vcf gets the error:
Loading SNP information from test_chr.vcf
Loading SNP information from YRI.22_chr.vcf
Loading SNP information from denisovan_nochr.vcf
ArchaicSeeker2: data.cpp:500: void genome::vcfsLoad(const std::vector<std::__cxx11::basic_string<char> >&): Assertion `tpos > prePos' failed.
After many trials, I concluded that AS2 requires "chr22" notation. The problem is that now I get "Segmentation fault (core dumped error" after "Loading SNP information" is complete as in my first message:
Loading SNP information from test_chr.vcf
Loading SNP information from YRI.22_chr.vcf
Loading SNP information from denisovan_chr22.vcf
Loading SNP information from neanderthal_chr22.vcf
Loading genotype information from test.vcf
Segmentation fault (core dumped)
Hi imdanique,
The error message you received:
"ArchaicSeeker2: data.cpp:500: void genome::vcfsLoad(const std::vector<std::__cxx11::basic_string
occurs because there are two variants with the same physical position (or, less likely, the variant positions are not in increasing order) in the test_nochr.vcf file.
Please check the input VCF files and ensure that the physical positions of all variants are in strictly increasing order.
Additionally, for the chromosome ID, please make sure you are using the same ID across all input files.
My best
Kai Yuan
@yorkklause Thank you for your response.
I’ve noticed that the “Assertion tpos > prePos’ failed” error seems to be related to the chromosome ID notation (at least in this case). Specifically, I get this error only when I change the chromosome ID from “chr22” to “22". If it is a positions-related, then I would get the error everytime no matter which ID I choose because I only change IDs and not positions. When the chromosome ID is “chr22”, everything works fine, and no “Assertion
tpos > prePos’ failed” error occur. However, if I change the chromosome ID to “22” (without altering the positions), I encounter the error.
I ensured there are no duplicate positions by using bcftools norm -D, then I sorted the positions in the VCF files. Finally, I verified that all VCF files have consistent chromosome IDs, which is “chr22” (it doesn't work with "22"). But now I can't get through "Segmentation fault (core dumped)"
Best, Daniyar
Hi Daniyar,
There are two issues with your input files:
Chromosome ID (causing the segmentation fault) Physical position (causing the "tpos > prePos" error) AS2 checks the chromosome ID before checking the physical positions. If the chromosome ID is incorrect, AS2 will throw an error before it checks the physical positions.
For the chromosome ID, it needs to be consistent across all the VCF files and parameter files (anc.par 2nd column; outgroup.par 2nd column; remap.par 2nd column).
Regarding your question, if you change the ID to "chr22", AS2 will not retrieve the recombination information, ancestral allele information, and outgroup information for "chr22" since we use "22" in our parameter files. This mismatch will cause a segmentation fault.
If you use "22" as your chromosome ID, AS2 will accept the chromosome ID but then check your VCF positions.
According to your description, you have sorted the VCF files and removed duplicate variants. Simply change the chromosome ID to "chr22", and it should be work.
My best
Kai Yuan
Alternatively, you can keep your vcf files as they are and change the second column in the parameter files to "chr22".
Hi @yorkklause,
After further investigation, I realized that the problem stemmed from my use of bcftools norm -D
. This command removes duplicate variants at identical positions, but not duplicate positions themselves. Since ArchaicSeeker2 requires no duplicate positions, the correct command is bcftools norm -d all
.
By processing all input files with the following command and ensuring consistent chromosome ID notation (“22”), the errors have been resolved:
bcftools norm -d all -m-any -Ou test.ch22.vcf.gz | bcftools view -m2 -M2 -v snps -Oz -o test.ch22.norm.vcf.gz -
Thank you for your guidance and support!
Hi, thank you for developing the tool!
I've been trying to run it for several days but everytime I run into an error. I'm running only chr22 of my data but it throws Segmentation fault (core dumped):
I did everything that was written in the manual (phased with shapeit5, sorted, removed non-snp variants, split biallalic), still have the error. Could you help me to resolve the issue please? @Zhang-Rui-2018