DReichLab / AdmixTools

Tools test whether admixture occurred and more
183 stars 64 forks source link

qpDstats produced strange results for some combination of species #59

Open smallfishcui opened 4 years ago

smallfishcui commented 4 years ago

Hi,

I am using qpdstats in admixtools to detect introgression among species. I used convertVCFtoEigenstrat.sh script to convert vcf file to eigenstrat file format, and assigned each individual to a population. When I perform the qpDstats, it seems some analyses runs fine, but others just show 0, and the ones run fine were all significant, could anybody tell what's the problem? Here is part of my result:

W | X | Y | Z | D | stderr | Zscore | BABA | ABBA | nsnps

1 Med CN AU EU 0 1 0 0 0 0
2 Med CN AU Usland 0 1 0 0 0 0
3 Med CN AU Usnat 0 1 0 0 0 0
4 Med CN AU OG 0 1 0 0 0 0
5 Med CN EU AU 0 1 0 0 0 0
6 Med CN EU Usland 0 1 0 0 0 0
7 Med CN EU Usnat 0 1 0 0 0 0
8 Med CN EU OG 0 1 0 0 0 0
9 Med CN Usland AU 0 1 0 0 0 0
10 Med CN Usland EU 0 1 0 0 0 0
37 Med AU OG CN 0 1 0 0 0 0
38 Med AU OG EU 0 1 0 0 0 0
39 Med AU OG Usland 0 1 0 0 0 0
40 Med AU OG Usnat 0 1 0 0 0 0
41 Med EU CN AU 0 1 0 0 0 0
42 Med EU CN Usland -0,3349 0,035253 -9,5 2485 4986 214014

Thanks, Cui

bumblenick commented 4 years ago

For all except the last line you have almost no data. Check the AU genotype carefully!

Nick

On Thu, Dec 12, 2019 at 9:39 AM smallfishcui notifications@github.com wrote:

Hi,

I am using qpdstats in admixtools to detect introgression among species. I used convertVCFtoEigenstrat.sh script to convert vcf file to eigenstrat file format, and assigned each individual to a population. When I perform the qpDstats, it seems some analyses runs fine, but others just show 0, and the ones run fine were all significant, could anybody tell what's the problem? Here is part of my result:

W | X | Y | Z | D | stderr | Zscore | BABA | ABBA | nsnps 37 | Med | AU | OG | CN | 0 | 1 | 0 | 0 | 0 | 0 38 | Med | AU | OG | EU | 0 | 1 | 0 | 0 | 0 | 0 39 | Med | AU | OG | Usland | 0 | 1 | 0 | 0 | 0 | 0 40 | Med | AU | OG | Usnat | 0 | 1 | 0 | 0 | 0 | 0 41 | Med | EU | CN | AU | 0 | 1 | 0 | 0 | 0 | 0 42 | Med | EU | CN | Usland | -0,3349 | 0,035253 | -9,5 | 2485 | 4986 | 214014

Thanks, Cui

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DReichLab/AdmixTools/issues/59?email_source=notifications&email_token=AEE77B5NTIZ2BZI5Q6PTR5TQYJEJPA5CNFSM4JZ7R472YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IACH6DQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEE77B22QTW7OGJS7VGXMZTQYJEJPANCNFSM4JZ7R47Q .

smallfishcui commented 4 years ago

Hi Nick,

Thank you for the suggestions. Yes, the population AU might be problematic. I guess all of them are somehow problematic, because they are all polyploids and I aligned them against a reference genome of the same species. Now it seems all the individuals have excessive amount of homozygous reference alleles, for example almost all AU sites has the genotype 0/0, and also other populations. In this case I guess even if I filter all the missing data, there are still large amount of SNPs uninformative(I have 776361SNPs in total, missing data filtered out).... So do you have any suggestions for polyploid cases, or is there ways to improve?

thanks, Cui

bumblenick notifications@github.com 于2019年12月12日周四 下午8:58写道:

For all except the last line you have almost no data. Check the AU genotype carefully!

Nick

On Thu, Dec 12, 2019 at 9:39 AM smallfishcui notifications@github.com wrote:

Hi,

I am using qpdstats in admixtools to detect introgression among species. I used convertVCFtoEigenstrat.sh script to convert vcf file to eigenstrat file format, and assigned each individual to a population. When I perform the qpDstats, it seems some analyses runs fine, but others just show 0, and the ones run fine were all significant, could anybody tell what's the problem? Here is part of my result:

W | X | Y | Z | D | stderr | Zscore | BABA | ABBA | nsnps 37 | Med | AU | OG | CN | 0 | 1 | 0 | 0 | 0 | 0 38 | Med | AU | OG | EU | 0 | 1 | 0 | 0 | 0 | 0 39 | Med | AU | OG | Usland | 0 | 1 | 0 | 0 | 0 | 0 40 | Med | AU | OG | Usnat | 0 | 1 | 0 | 0 | 0 | 0 41 | Med | EU | CN | AU | 0 | 1 | 0 | 0 | 0 | 0 42 | Med | EU | CN | Usland | -0,3349 | 0,035253 | -9,5 | 2485 | 4986 | 214014

Thanks, Cui

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/DReichLab/AdmixTools/issues/59?email_source=notifications&email_token=AEE77B5NTIZ2BZI5Q6PTR5TQYJEJPA5CNFSM4JZ7R472YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IACH6DQ , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AEE77B22QTW7OGJS7VGXMZTQYJEJPANCNFSM4JZ7R47Q

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DReichLab/AdmixTools/issues/59?email_source=notifications&email_token=AJZNAZNSWWLXTLBTYL7KYIDQYKCVVA5CNFSM4JZ7R472YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGXVFNA#issuecomment-565138100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJZNAZMWRIMF4PBNMDJ3XVLQYKCVVANCNFSM4JZ7R47Q .

smallfishcui commented 4 years ago

Hi I come back again. Still the same problem. W X Y Z D stderr Zscore BABA ABBA nsnps

1 Med EU Usland CN 0.795 0.140 5.68 196 22 163723 2 Med EU AU CN -0.0341 0.0645 -0.529 16 18 163723 3 Med Usland EU CN 0 1 0 0 0 0 4 Med Usland AU CN 0 1 0 0 0 0 5 Med AU EU CN 0 1 0 0 0 0 6 Med AU Usland CN 0 1 0 0 0 0 7 EU Med Usland CN -0.795 0.140 -5.68 22 196 163723 it seems this is not caused by genotyping problem of a certain population, but for some combinations it go wrong. For example, the data should have 163723SNPs in total, but for those failed lines the SNP is 0. How could it be? br, Cui
bumblenick commented 4 years ago

I don't think I can help further. Your data must be weird. Please note that the software assumes biallelic snp data, and I suspect your polyploid data is making it unhappy.

Nick

On Thu, Jan 16, 2020 at 11:56 AM smallfishcui notifications@github.com wrote:

Hi I come back again. Still the same problem. W X Y Z D stderr Zscore BABA ABBA nsnps

1 Med EU Usland CN 0.795 0.140 5.68 196 22 163723 2 Med EU AU CN -0.0341 0.0645 -0.529 16 18 163723 3 Med Usland EU CN 0 1 0 0 0 0 4 Med Usland AU CN 0 1 0 0 0 0 5 Med AU EU CN 0 1 0 0 0 0 6 Med AU Usland CN 0 1 0 0 0 0 7 EU Med Usland CN -0.795 0.140 -5.68 22 196 163723 it seems this is not caused by genotyping problem of a certain population, but for some combinations it go wrong. For example, the data should have 163723SNPs in total, but for those failed lines the SNP is 0. How could it be?

br, Cui

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DReichLab/AdmixTools/issues/59?email_source=notifications&email_token=AEE77B4OARNXKWFMXJGIMUDQ6CGU7A5CNFSM4JZ7R472YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJEY2OY#issuecomment-575245627, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEE77B2ISTADCTIVQYZMQYTQ6CGU7ANCNFSM4JZ7R47Q .

smallfishcui commented 4 years ago

Hi Nick, I just realized one error message from convertf saying that "snp order check fail; snp list not order", as someone poseted here:https://github.com/DReichLab/EIG/issues/37 and here:https://www.biostars.org/p/389958/ So I guess the malformated map file may be the reason that caused the failure of the admixtools run. Here is the format of my map file produced by vcftools: 7180003098625 340:374:- 0 2078 7180003098625 340:233:- 0 2219 7180003098625 340:213:- 0 2239 7180003098625 340:188:- 0 2264 7180003098625 340:133:- 0 2319 7180003098625 340:83:- 0 2369 According to your instruction I changed the map file to 1 340:374:- 0 2078 1 340:233:- 0 2219 1 340:213:- 0 2239 1 340:188:- 0 2264 1 340:133:- 0 2319 1 340:83:- 0 2369 However, I don't know how to format the $2 and the $4 about the SNP position and coordination, can you give some suggestions?

thanks, Cui