BarinthusBio / HLAfreq

Aggregate HLA allele frequencies data from allelefrequencies.net at large multi population scale
MIT License
4 stars 1 forks source link

Relative frequencies across loci #6

Open Yegor13 opened 3 months ago

Yegor13 commented 3 months ago

Hello,

Thank you very much for a nice package!

My question is: can one use allelefrequencies data to calculate relative usage of different loci in a population or ethnic group? Does it actually make sense? In this notebook https://github.com/BarinthusBio/HLAfreq/blob/main/examples/single_country.ipynb it’s mentioned that frequency estimates can be combined only for a single loci at a time, but why?

DAWells commented 1 month ago

Thanks for your question, sorry it's taken me a while to get to.

It's because of the way the functions are written, they assume all the data is for a single locus when adding unmeasured alleles, summing allele frequencies, and sample sizes. These nothing stopping you from calculating the allele frequencies of different loci separately.

However, if you're interested in calculating linkage disequilibrium or haploid frequencies you have to be more careful. You cannot calculate the frequency of people with HLA-A01 and HLA-B01 from the frequencies of these alleles separately because there is strong linkage disequilibirum between HLA loci. You have to use studies that have measured HLA-A and HLA-B in the same individuals.

Since allelefrequencies.net do provide access to haploid data it would be possible to extend HLAfreq to estimate haplotypes and linkage disequilibrium too. But that would depend on interest from users.

Does that answer your question?