BoPeng / simuPOP

A general-purpose forward-time population genetics simulation environment.
http://bopeng.github.io/simuPOP/
GNU General Public License v2.0
30 stars 12 forks source link

Genotype frequencies for multiple loci #76

Closed JKHopf closed 5 years ago

JKHopf commented 5 years ago

Hi Bo,

Thank you for creating and maintaining simuPOP.

I am new to simuPOP and python, and I was wondering if there was a way to get the frequencies of the whole genotype, rather than the genotype for each locus?

I.e. for a scenario with 2 loci (named 'A' & 'B'), each on separate chromosomes and each with 2 alleles, an individual could have one of nine genotypes (AABB, AaBB, aaBB, etc.). At the moment I can pull out the frequencies for the genotypes at either locus, just using genoFreq, but ideally, I would like to know the frequencies of each of the nine total genotypes (e.g. genotype: frequency = 0000: 0.1, 0100: 0.05, 1100: 0.2, etc.). Also any ideas on how to format this output for exporting in a text file using 'output=' ?

Cheers Jess

import simuPOP as sim
pop = sim.Population(100, loci=[1, 1], lociNames=['A', 'B'],
                     alleleNames=[['A', 'a'], ['B', 'b']],)
sim.initGenotype(pop, freq=[0.5, 0.5])

pop.evolve(
    initOps=sim.InitSex(),
    matingScheme=sim.RandomMating(),
    postOps=[
        sim.Stat(genoFreq=['A', 'B']),
        sim.PyEval(r'"%d, %s\n" %'
            ' (gen, genoFreq)')
        ],
    gen=3
) 
BoPeng commented 5 years ago

There is haploFreq, which counts by haplotype. There is no native method to count the "complete genotype/haplotype" as you described, but you can do that easily with a PyOperator if needed. Something like (very quickly, no check of syntax or anything)

def countAll(pop):
    counts = defaultdict()
    for ind in pop.individuals():
        counts[ (ind.allele(a0, 0), ind.allele(a1, 0), ind.allele(a0, 1), ind.allele(a1, 1)) ] += 1
    # write to a file or something
    return True

... PyOperator(func=countAll) ...
JKHopf commented 5 years ago

Thank you for the quick reply @BoPeng, that helped solve a good chunk of my problem.

For anyone looking into this in the future, my solution was

import simuPOP as sim
pop = sim.Population(100, loci=[1, 1], lociNames=['A', 'B'],
                     alleleNames=[['A', 'a'], ['B', 'b']],)
sim.initGenotype(pop, freq=[0.5, 0.5])
sim.stat(pop, genoFreq=['A', 'B'])  # both loci indexes and names can be used.

def countAll(pop):
    # create a defualt dictionary, called counts,  where a new key is
    # produced if its not there already
    counts = defaultdict(int)
    for ind in pop.individuals():
        # find the genotype for each individual, and then add one to
        # the value of the key with that genotype i the dicitonary
        # .allele(locus index, chromosome index)
        # will give alleles of first homolgous set and then second
        # (same layout as pop.individual().genotype(), but as a tuple)
        # (ABAB, AbAB, ABaB, etc.)
        counts[(ind.allele(0, 0), ind.allele(1, 0),
                ind.allele(0, 1), ind.allele(1, 1))] += 1
    print(counts)
    return True

pop.evolve(
    initOps=sim.InitSex(),
    matingScheme=sim.RandomMating(),
    postOps=[
            sim.PyOperator(func=countAll)
        ],
    gen=3
)