Shuhua-Group / ArchaicSeeker2.0

ArchaicSeeker is a series of software for detecting archaic introgression sequences and reconstructing introgression history. The latest version of this series, ArchaicSeeker 2.0, has the following three notable improvements compared with the original version of this software. First, it can automatically determine the boundary of each introgressed sequence. Next, it is capable of tracing both known and unknown ancestral sources of a given introgressed sequence. Finally, it has the ability to reconstruct the introgression history with more sophisticated introgression models.
GNU General Public License v3.0
16 stars 3 forks source link

ArchaicSeeker2: data.cpp:716: void genome::vcfsLoad(const std::vector<std::__cxx11::basic_string<char> >&): Assertion `isphased == phased[index]' failed. #7

Closed VieyraS closed 10 months ago

VieyraS commented 10 months ago

Hello,

When trying to use the data, I get the following output and error:

Loading SNP information from [my path]/data.vcf.gz
Loading genotype information from [my path]/data.vcf.gz
ArchaicSeeker2: data.cpp:716: void genome::vcfsLoad(const std::vector<std::__cxx11::basic_string<char> >&): Assertion `isphased == phased[index]' failed.
Aborted (core dumped)

I have restricted my VCF to biallelic SNPs, set the proper reference allele, and restricted to relevant samples. My data is phased and in v4.2 VCF specification, and I have checked both via bcftools and manually that all sites are formatted as phased properly.

I have checked that I have the proper labels in the pop.par file, and also tried making sure that the samples in said file are in the same order as in the VCF in case that was the issue, but it didn't correct the error.

Please advice on possible solutions, thank you

Zhang-Rui-2018 commented 10 months ago

Please check your pop.par file and VCF data file. It seems that some parameters are inconsistent between different individuals which incur this error message. ArchaicSeeker 2.0 requires specific label for African, Archaic, and Test population. You can post some rows of your pop.par file and the header line (which includes the individual label) of your VCF file here if appropriate.

VieyraS commented 10 months ago

Sure, here are both of those:

The population file:

ID      Pop     ArchaicSeekerPop
Nea_ind3        Nea     Archaic
Den_ind4        Den     Archaic
Afr_ind5        Afr     African
Afr_ind6        Afr     African
Afr_ind7        Afr     African
Afr_ind8        Afr     African
Afr_ind9        Afr     African
Afr_ind10       Afr     African
Afr_ind11       Afr     African
Afr_ind12       Afr     African
Afr_ind13       Afr     African
Afr_ind14       Afr     African
Afr_ind15       Afr     African
Afr_ind16       Afr     African
Afr_ind17       Afr     African
Afr_ind18       Afr     African
Afr_ind19       Afr     African
Afr_ind20       Afr     African
Afr_ind21       Afr     African
Afr_ind22       Afr     African
Afr_ind23       Afr     African
Afr_ind24       Afr     African
Afr_ind25       Afr     African
Afr_ind26       Afr     African
Afr_ind27       Afr     African
Afr_ind28       Afr     African
Afr_ind29       Afr     African
Afr_ind30       Afr     African
Afr_ind31       Afr     African
Afr_ind32       Afr     African
Afr_ind33       Afr     African
Afr_ind34       Afr     African
OoA_ind35       OoA     Test
OoA_ind36       OoA     Test
OoA_ind37       OoA     Test
OoA_ind38       OoA     Test
OoA_ind39       OoA     Test
OoA_ind40       OoA     Test
OoA_ind41       OoA     Test
OoA_ind42       OoA     Test
OoA_ind43       OoA     Test
OoA_ind44       OoA     Test
OoA_ind45       OoA     Test
OoA_ind46       OoA     Test
OoA_ind47       OoA     Test
OoA_ind48       OoA     Test
OoA_ind49       OoA     Test
OoA_ind50       OoA     Test
OoA_ind51       OoA     Test
OoA_ind52       OoA     Test
OoA_ind53       OoA     Test
OoA_ind54       OoA     Test
OoA_ind55       OoA     Test
OoA_ind56       OoA     Test
OoA_ind57       OoA     Test
OoA_ind58       OoA     Test
OoA_ind59       OoA     Test
OoA_ind60       OoA     Test
OoA_ind61       OoA     Test
OoA_ind62       OoA     Test
OoA_ind63       OoA     Test
OoA_ind64       OoA     Test
Pap_ind65       Pap     Test
Pap_ind66       Pap     Test
Pap_ind67       Pap     Test
Pap_ind68       Pap     Test
Pap_ind69       Pap     Test
Pap_ind70       Pap     Test
Pap_ind71       Pap     Test
Pap_ind72       Pap     Test
Pap_ind73       Pap     Test
Pap_ind74       Pap     Test
Pap_ind75       Pap     Test
Pap_ind76       Pap     Test
Pap_ind77       Pap     Test
Pap_ind78       Pap     Test
Pap_ind79       Pap     Test
Pap_ind80       Pap     Test
Pap_ind81       Pap     Test
Pap_ind82       Pap     Test
Pap_ind83       Pap     Test
Pap_ind84       Pap     Test
Pap_ind85       Pap     Test
Pap_ind86       Pap     Test
Pap_ind87       Pap     Test
Pap_ind88       Pap     Test
Pap_ind89       Pap     Test
Pap_ind90       Pap     Test
Pap_ind91       Pap     Test
Pap_ind92       Pap     Test
Pap_ind93       Pap     Test
Pap_ind94       Pap     Test
Pap_ind95       Pap     Test
Pap_ind96       Pap     Test
Pap_ind97       Pap     Test
Pap_ind98       Pap     Test
Pap_ind99       Pap     Test
Pap_ind100      Pap     Test
Pap_ind101      Pap     Test
Pap_ind102      Pap     Test
Pap_ind103      Pap     Test
Pap_ind104      Pap     Test
Pap_ind105      Pap     Test
Pap_ind106      Pap     Test
Pap_ind107      Pap     Test
Pap_ind108      Pap     Test
Pap_ind109      Pap     Test
Pap_ind110      Pap     Test
Pap_ind111      Pap     Test
Pap_ind112      Pap     Test
Pap_ind113      Pap     Test
Pap_ind114      Pap     Test

The VCF header:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Nea_ind3        Den_ind4        Afr_ind5        Afr_ind6        Afr_ind7        Afr_ind8        Afr_ind9        Afr_ind10       Afr_ind11      Afr_ind12       Afr_ind13       Afr_ind14       Afr_ind15       Afr_ind16       Afr_ind17       Afr_ind18       Afr_ind19       Afr_ind20       Afr_ind21       Afr_ind22       Afr_ind23       Afr_ind24      Afr_ind25       Afr_ind26       Afr_ind27       Afr_ind28       Afr_ind29       Afr_ind30       Afr_ind31       Afr_ind32       Afr_ind33       Afr_ind34       OoA_ind35       OoA_ind36       OoA_ind37      OoA_ind38       OoA_ind39       OoA_ind40       OoA_ind41       OoA_ind42       OoA_ind43       OoA_ind44       OoA_ind45       OoA_ind46       OoA_ind47       OoA_ind48       OoA_ind49       OoA_ind50      OoA_ind51       OoA_ind52       OoA_ind53       OoA_ind54       OoA_ind55       OoA_ind56       OoA_ind57       OoA_ind58       OoA_ind59       OoA_ind60       OoA_ind61       OoA_ind62       OoA_ind63      OoA_ind64       Pap_ind65       Pap_ind66       Pap_ind67       Pap_ind68       Pap_ind69       Pap_ind70       Pap_ind71       Pap_ind72       Pap_ind73       Pap_ind74       Pap_ind75       Pap_ind76      Pap_ind77       Pap_ind78       Pap_ind79       Pap_ind80       Pap_ind81       Pap_ind82       Pap_ind83       Pap_ind84       Pap_ind85       Pap_ind86       Pap_ind87       Pap_ind88       Pap_ind89      Pap_ind90       Pap_ind91       Pap_ind92       Pap_ind93       Pap_ind94       Pap_ind95       Pap_ind96       Pap_ind97       Pap_ind98       Pap_ind99       Pap_ind100      Pap_ind101      Pap_ind102     Pap_ind103      Pap_ind104      Pap_ind105      Pap_ind106      Pap_ind107      Pap_ind108      Pap_ind109      Pap_ind110      Pap_ind111      Pap_ind112      Pap_ind113      Pap_ind114

I have the same individuals in both cases. The populations are labeled properly, as far as I can tell, unless the label in the second column also needs to be Neanderthal/Denisova/YRI/Test?. If not, could the issue be related to having more than one test population?

Thank you for you response

Zhang-Rui-2018 commented 10 months ago

Please put reference populations and target population into different VCF files. Phasing is required for the target population but is not necessary for the reference population. You can take our star protocols paper (https://doi.org/10.1016/j.xpro.2022.101314) as a reference.

The second column of pop.par is not necessary to be Neanderthal/ Denisova/ YRI/ Test. This could be defined by the users.

For the case of more than one test population, we usually analyze different target populations separately. The estimated model parameters could vary for different populations, which means the estimated results may be effected if you put multiple target populations into one analysis.

VieyraS commented 10 months ago

Got it, thank you

Just as a small suggestion, maybe the wording on the manual could be changed to reflect what you explained. The line:

Input data could be either splitted by chromosomes, populations, or combined together in a single file. Our software could get the intersection of data, automatically.

is what led me to believe that it would be okay to leave everything together, despite the example files being different.

Other than that, I just want to thank you for your help.

Zhang-Rui-2018 commented 10 months ago

Thank you for your suggestion! We will modify this sentence to avoid the misleading message.