Shuhua-Group / ArchaicSeeker2.0

ArchaicSeeker is a series of software for detecting archaic introgression sequences and reconstructing introgression history. The latest version of this series, ArchaicSeeker 2.0, has the following three notable improvements compared with the original version of this software. First, it can automatically determine the boundary of each introgressed sequence. Next, it is capable of tracing both known and unknown ancestral sources of a given introgressed sequence. Finally, it has the ability to reconstruct the introgression history with more sophisticated introgression models.
GNU General Public License v3.0
17 stars 3 forks source link

Segmentation fault (core dumped) #10

Closed imdanique closed 2 months ago

imdanique commented 2 months ago

Hi, thank you for developing the tool!

I've been trying to run it for several days but everytime I run into an error. I'm running only chr22 of my data but it throws Segmentation fault (core dumped):

Loading SNP information from test.22.vcf.gz
Loading SNP information from YRI.22_chr_norm.vcf
Loading SNP information from chr22_mq25_mapab100_chr_norm.vcf
Loading SNP information from chr22_mq25_mapab100_chr_norm.vcf
Loading genotype information from test.22.vcf.gz
Segmentation fault (core dumped

I did everything that was written in the manual (phased with shapeit5, sorted, removed non-snp variants, split biallalic), still have the error. Could you help me to resolve the issue please? @Zhang-Rui-2018

yorkklause commented 2 months ago

Hi imdanique,

Thank you for using our software.

Based on the error message you received, it appears that the input test VCF file "test.22.vcf.gz" does not comply with our software's format requirements. Could you please share the header and the first several variants of that file with us for further diagnosis?

One more question: why are there two "chr22_mq25_mapab100_chr_norm.vcf" present in your input files?

Thank you,

Kai Yuan

imdanique commented 2 months ago

@yorkklause Thank you for the quick reply!

Attached is the truncated version of the vcf file that I've been using test2.vcf.gz

Regarding two "chr22_mq25_mapab100_chr_norm.vcf" files, these are Neanderthal and Denisovan files downloaded from http://cdna.eva.mpg.de/neandertal/Vindija/VCF/. Initially, I tried to use the vcfs that were recommended by AS2 manual (http://cdna.eva.mpg.de/neandertal/altai/), however they didn't work and I switched to Vindija vcfs. Not sure why the initial files didn't work with the tool but new files seem ok

yorkklause commented 2 months ago

Hi imdanique,

Could you change "chr22" to "22" in your test2.vcf.gz file and try again? It seems AS2 is treating "chr22" and "22" as two different chromosomes.

Thank you

Kai Yuan

imdanique commented 2 months ago

Hi @yorkklause,

I tried both with "chr22" and with "22". When I run vcfs with "22", I get

Loading SNP information from test_nochr.vcf
ArchaicSeeker2: data.cpp:500: void genome::vcfsLoad(const std::vector<std::__cxx11::basic_string<char> >&): Assertion `tpos > prePos' failed.

This error occur at the exact vcf that have "22". So if my first two vcfs have "chr22" and third vcf has "22", third vcf gets the error:

Loading SNP information from test_chr.vcf
Loading SNP information from YRI.22_chr.vcf
Loading SNP information from denisovan_nochr.vcf
ArchaicSeeker2: data.cpp:500: void genome::vcfsLoad(const std::vector<std::__cxx11::basic_string<char> >&): Assertion `tpos > prePos' failed.

After many trials, I concluded that AS2 requires "chr22" notation. The problem is that now I get "Segmentation fault (core dumped error" after "Loading SNP information" is complete as in my first message:

Loading SNP information from test_chr.vcf
Loading SNP information from YRI.22_chr.vcf
Loading SNP information from denisovan_chr22.vcf
Loading SNP information from neanderthal_chr22.vcf
Loading genotype information from test.vcf
Segmentation fault (core dumped)
yorkklause commented 2 months ago

Hi imdanique,

The error message you received:

"ArchaicSeeker2: data.cpp:500: void genome::vcfsLoad(const std::vector<std::__cxx11::basic_string >&): Assertion `tpos > prePos' failed."

occurs because there are two variants with the same physical position (or, less likely, the variant positions are not in increasing order) in the test_nochr.vcf file.

Please check the input VCF files and ensure that the physical positions of all variants are in strictly increasing order.

Additionally, for the chromosome ID, please make sure you are using the same ID across all input files.

My best

Kai Yuan

imdanique commented 2 months ago

@yorkklause Thank you for your response.

I’ve noticed that the “Assertion tpos > prePos’ failed” error seems to be related to the chromosome ID notation (at least in this case). Specifically, I get this error only when I change the chromosome ID from “chr22” to “22". If it is a positions-related, then I would get the error everytime no matter which ID I choose because I only change IDs and not positions. When the chromosome ID is “chr22”, everything works fine, and no “Assertiontpos > prePos’ failed” error occur. However, if I change the chromosome ID to “22” (without altering the positions), I encounter the error.

I ensured there are no duplicate positions by using bcftools norm -D, then I sorted the positions in the VCF files. Finally, I verified that all VCF files have consistent chromosome IDs, which is “chr22” (it doesn't work with "22"). But now I can't get through "Segmentation fault (core dumped)"

Best, Daniyar

yorkklause commented 2 months ago

Hi Daniyar,

There are two issues with your input files:

Chromosome ID (causing the segmentation fault) Physical position (causing the "tpos > prePos" error) AS2 checks the chromosome ID before checking the physical positions. If the chromosome ID is incorrect, AS2 will throw an error before it checks the physical positions.

For the chromosome ID, it needs to be consistent across all the VCF files and parameter files (anc.par 2nd column; outgroup.par 2nd column; remap.par 2nd column).

Regarding your question, if you change the ID to "chr22", AS2 will not retrieve the recombination information, ancestral allele information, and outgroup information for "chr22" since we use "22" in our parameter files. This mismatch will cause a segmentation fault.

If you use "22" as your chromosome ID, AS2 will accept the chromosome ID but then check your VCF positions.

According to your description, you have sorted the VCF files and removed duplicate variants. Simply change the chromosome ID to "chr22", and it should be work.

My best

Kai Yuan

yorkklause commented 2 months ago

Alternatively, you can keep your vcf files as they are and change the second column in the parameter files to "chr22".

imdanique commented 2 months ago

Hi @yorkklause,

After further investigation, I realized that the problem stemmed from my use of bcftools norm -D. This command removes duplicate variants at identical positions, but not duplicate positions themselves. Since ArchaicSeeker2 requires no duplicate positions, the correct command is bcftools norm -d all.

By processing all input files with the following command and ensuring consistent chromosome ID notation (“22”), the errors have been resolved:

bcftools norm -d all -m-any -Ou test.ch22.vcf.gz | bcftools view -m2 -M2 -v snps -Oz -o test.ch22.norm.vcf.gz -

Thank you for your guidance and support!