Closed wlz0726 closed 7 years ago
Hi @ANGSD
I have the same questions. We really need your help, Thanks.
Best
It depends on what analysis you are interested in afterwards. The gls are calculated independently per sample. If you are interested in per population analysis you should of course do the analysis per pop. If you are intersted in doing multipopulation you should use all pops.
Hi @ANGSD , Thanks for the reply.
I'd like to add few things.
I have some populations with unbalanced sample size (with low to median sequencing depth), for example: 10 samples in PopA and 20 in PopB, 50 in PopC, 50 in PopD.
I'd like to do some population based (right?) SNP filtering such as HWE, MAF, sample Missing Percentage (nInd
in mafs.gz), SB3, baseQ_Pval et al. . It will bias the results for population with small sample size (PopA) if I do it with all samples (PopA + PopB + PopC + PopD).
In my understanding, here is what I should do:
-sites
parameterIs this a proper way?
Thanks
This sounds correct if you want to do snp and genotype calling. However many good analysis is angsd is based on the raw gls, and you might not need to do genotype calling
On Thu, Nov 24, 2016 at 12:43 PM, Lizhong Wang notifications@github.com wrote:
Hi @ANGSD https://github.com/ANGSD , Thanks for the reply.
I'd like to add few things.
I have some populations with unbalanced sample size (with low to median sequencing depth), for example: 10 samples in PopA and 20 in PopB, 50 in PopC, 50 in PopD.
I'd like to do some population based (right?) SNP filtering such as HWE, MAF, sample Missing Percentage (nInd in mafs.gz), SB3, baseQ_Pval et al. . It will bias the results for population with small sample size (PopA) if I do it with all samples (PopA + PopB + PopC + PopD).
In my understanding, here is what I should do:
- do the SNP calling in separate population
- do the filtering in each population (HWE, MAF, nInd, SB3)
- merge the SNP sites in different population (Overlaps), maybe need filter tri-allele, generate the "final SNP sites"
- generate GL files based on this "final SNP sites" with -sites parameter
- do phasing with beagle and get the Genotype data
Is this a proper way?
Thanks
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ANGSD/angsd/issues/56#issuecomment-262757503, or mute the thread https://github.com/notifications/unsubscribe-auth/AGDo7o6gX0mQW4pyhaOv8gYFu937KWU9ks5rBXh3gaJpZM4K6Oe2 .
Yeah, I know that most follow up analysis of angsd/ngsTools are based on GLs (including the monomorphic sites as background, which do great help when compute posterior probabilities of allele frequencies or summary statistics).
I want to make sure that I'm doing the right thing when I need Genotypes. Now I'm more confident with that. thank you.
all the Best
super, Ill close this issue, feel free to reopen if needed.
Hi, I have 5 domestic populations and 1 wild population While I doMaf to get SNP position, should I do this separately (in to 6 pops) Or use all individuals in one single bam list (all in One pop) Or Just use 2 pops(wild and domestic)
Is it a big difference in angsd?
I assume inner population structure will bias the GL estimate process and the LRT test of SNPs, and I should always do it separately (each population)?
Am I right?
Thanks.