BoPeng / simuPOP

A general-purpose forward-time population genetics simulation environment.
http://bopeng.github.io/simuPOP/
GNU General Public License v2.0
29 stars 12 forks source link

How to create new loci in an existing population? #43

Closed ashander closed 6 years ago

ashander commented 6 years ago

My reason for doing this is to implement infinite sites. I have read the manual page section that shows using infinite alleles as a way to implement infinite sites. That section ends by stating

However, if the location of loci cannot be determined beforehand, it is sometimes desired to create new loci as a result of mutation. A customized operator can be used for this purpose (see Example newOperator), but extra attention is needed to make sure that other operators are applied to the
correct loci because loci indexes will be changed with the insertion of new
loci. This technique could also be used to simulate mutations over long
sequences.

I'm interested in doing this but can't find the a way to add loci.

PS: It seems possible that the mutation space operators https://github.com/BoPeng/simuPOP/commit/ 5ceddb33d08f97bf35e27c5a640bb55941d66826 are designed to allow this in a different way. I looked around for examples of their use and can find this one [srv.py](https://github.com/BoPeng/simuPOP-examples/blob/ master/published/simuRareVariants/srv.py) but it seems like it may be using an older version (1.0.5)?

BoPeng commented 6 years ago

You will need to use the Population.addLoci function to insert loci dynamically, and this has to be done in a PyOperator, so the usage should look something like

def insertLoci(pop, options):
    pop.addLoci(xxxxx)

pop.evolve(
    ops=[
        PyOperator(func=insertLoci, params=...)
    ],

You could derive a class from PyOperator but that is not necessary.

As noted, because insertion of loci will change the index of other loci, you will need to either use name of loci or adjust loci index if you apply a selection operator according to genotype at certain loci.

A bigger problem is that insertion of loci is an expensive task for simuPOP especially for array-type genotype storage models (short, long, binary) since genotypes of all individuals will be changed (adding mutant or wildtype alleles). This is why we developed srv that tried to store mutations. That approach was later formalized as the mutant storage model (alleleType='mutant') but srv can still be faster. srv was written for Python 2.7 but the latest version should be easy to convert it Python 3. The problem is more with the deprecated parameter handling code (written before Python introduces argparse), not with the main code.

ashander commented 6 years ago

Thanks @BoPeng ! I think alleleType='mutant' was the missing piece and for my simple application gets close enough to an efficient 'infinite' sites model