ANGSD / angsd

Program for analysing NGS data.
231 stars 51 forks source link

Heterozygosity/genotype calling alternative allele frequencies #529

Open Mvwestbury opened 2 years ago

Mvwestbury commented 2 years ago

Hi

I have simulated some haploid data with Illumina error rates and ancient DNA data. It is ~25x coverage. I wanted to see how the damage influences heterozygosity/genotype calls. I noticed that heterozygous sites are being called relatively frequently due to the errors despite high coverage.

I printed the counts of the different nucleotides to see what may be driving this and compared it to the genotype calls (-dogeno 5). Here is an example where I pasted the genotype and counts next to each other VHQK01014937.1 75 G A GA 2 0 22 0

As you can see only 2/24 reads support A but it is given a GA genotype. This is obviously too low a number and therefore not heterozygous.

Is there anyway to increase this threshold to say 30% of the reads need to support the alternative allele to call it heterozygous?

I also see this problem when using -doSaf to estimate the SFS for heterozygosity so it is not doGeno specific.

Thanks, Mick

TonyKess commented 1 year ago

Have you experimented with setting different SNP pvalues? Changing this value might affect stringency of heterozygous calls

Mvwestbury commented 1 year ago

Yeah I tried fiddling with it and it didn't seem to help

On Wed, 30 Nov 2022, 15:32 TonyKess, @.***> wrote:

Have you experimented with setting different SNP pvalues? Changing this value might affect stringency of heterozygous calls

— Reply to this email directly, view it on GitHub https://github.com/ANGSD/angsd/issues/529#issuecomment-1332240112, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQGP4TFJCB6AFBVNXQJ2PH3WK5QPJANCNFSM6AAAAAARBIRHQA . You are receiving this because you authored the thread.Message ID: @.***>

TonyKess commented 1 year ago

Have you inspected the likelihoods directly? I wonder if this will have an impact when those are taken into account for either pcangsd or sfs?