BoPeng / simuPOP

A general-purpose forward-time population genetics simulation environment.
http://bopeng.github.io/simuPOP/
GNU General Public License v2.0
30 stars 12 forks source link

Weight = -1 for CloneGenoTransmitter tries to create more offspring than it can handle #114

Closed marina-klemm closed 9 months ago

marina-klemm commented 10 months ago

Hi Bo, I am simulating this New Zealand population of whales that went from 30,000 individuals to 40 after whaling. I want to calculate haplotype frequencies at each generation since, given that this species has miraculously survived whaling and it's been recovering for the last few generations.

The problem I am facing is that I need to use CloneGenoTransmitter to simulate the mitochondrial transmission, and I want to use weight = -1 for this mating option so it is processed before the other mating scheme I have coded (which has been mentioned in [issues #12 , #15 and #75 ]). However, since I have some lethal events (which I coded), I keep facing the error "Mating scheme with a negative weight of 1 would like to produce 14696 offspring, but there are only 1500 unclaimed offspring left.". If I comment out the weight, it all works well.

Is there a downside to not adding weight to this function? Everything else I wanted to achieve has been completed. Thank you for your help!

PS: if needed, the script and two supporting txt files follow here. mtdna_1.txt snp_1.txt whale_for_Bo.txt

BoPeng commented 10 months ago

Without looking at your code in detail, the error message was produced because, when there are both negative and positive weights, the negative weights are processed (cloned), and simuPOP requires the offspring subpopuation is larger than the parental generation (with weight=-1). This behavior is questionable when there are more parents than offspring.

If you are proficient enough at compiling simuPOP from source, I can create a patch and change this behavior. Otherwise, you will have to go through a more tedious route, namely using a ConditionalMating scheme that uses a single CloneMating scheme when the bottleneck happens. This works more or less like (without testing):

pop.dvars().demo = demo

pop.evolve(...
preOps = [
    sim.Stat(popSize=True),
    ],  
sim.Conditionalmating('popSize > demo(gen)', 
    sim.CloneMating(...), 
    simu.HeteroMating(...), # existing one
    )
)

The pop.dvars().demo = demo is important here because the expression popSize > demo(gen) needs demo in the population's namespace.

BoPeng commented 10 months ago

I created a branch issue114. Please checkout the branch, compile simuPOP from the source (generally python setup.py install), and let me know if the patch works.

marina-klemm commented 10 months ago

Awesome, thank you so much!! Will let you know if I find any issues.

marina-klemm commented 10 months ago

Hi Bo, Thank you heaps again for creating the patch. I haven't been able to successfully install it unfortunately - something to do with a zlib.h, which was a problem when I first installed simuPOP but I cannot recall how I fixed it the first time around. So, I am going through the roundabout way you suggested.

Is this the right way to handle it?

pop.dvars().demo = demo #BO: is important here because the expression 
#popSize > demo(gen) needs demo in the population's namespace.

pop.evolve(
    initOps = [sim.InitSex(), sim.IdTagger(), sim.PyOperator(func=init_native_breeding_grounds)] +
                   [sim.InitGenotype(subPops = sub_population_names[i], freq=haplotype_frequencies[i], loci=[nb_loci]) for i in range(0, sub_population_count)] +
                   [sim.InitGenotype(subPops = sub_population_names[i], freq=[snp[n][i], 1-snp[n][i]], loci=[n]) for i in range(0, sub_population_count) for n in range(0, nb_loci-1)],
    # increase age by 1
    preOps = [sim.InfoExec('age += 1'),
              sim.Stat(popSize=True), #print pop size in each generation
              sim.PyEval(r'"%s\n" % subPopSize'),
              sim.PyOperator(func=removeOverspill), # randomly reduce population size so that parent generation fits
 #Export population in each generation
   Exporter(format='csv', infoFields=('age', 'ind_id', 'father_id', 'mother_id', 'nitrogen', 'carbon', 'feeding_ground', 'native_breeding_ground', 'migrate_to'), 
           output="!'dump_gen_%d.csv' % gen", step=1, begin=75)
           ],
    matingScheme = sim.ConditionalMating('popSize > demo(gen)',
        # age <= maxAge, copy to the next generation (weight=-1)
        # subPops is a list of tuples that will participate in mating. The tuple is a pair (subPopulation, virtualSubPopulation)
        # First, we propagate (clone) all individuals in all subpopulations (and all VSPs except the ones who are now in the VSP of deceased individuals) to the next generation
        sim.CloneMating(ops=[sim.CloneGenoTransmitter(chroms=[0,1])],
                            subPops=[(sub_population, 6) for sub_population in range(0, sub_population_count)], #MK:6 here is because
                            #two of the eight virtual subpopulations are deceased.
                            weight=-1
                            ), #MK: if weights are negative, they are multiplied to their parental subpopulation;
            #For example: if parental pop = (500, 1000), and weight = -2, next 
            #generation pop= (1000, 2000). 
            #For weight -1, it keeps the number of individuals from the parental generation.
            #ALSO: if there is a mix of negative and positive weights, the negative will be processed first.
        # Then we simulate random mating only in VSP 1 (ie reproductively mature individuals) within subpopulation (breeding/winter grounds)
        sim.RandomMating(ops=[sim.MitochondrialGenoTransmitter(),
                                  sim.MendelianGenoTransmitter(),
                                  sim.IdTagger(),
                                  sim.InheritTagger(mode=sim.MATERNAL, infoFields=['feeding_ground']),
                                  #an attempt to make sure carbon and nitrogen are
                                  #only inherited from their mother
                                  sim.InheritTagger(mode=sim.MATERNAL, infoFields=['nitrogen']),
                                  sim.InheritTagger(mode=sim.MATERNAL, infoFields=['carbon']),
                                  sim.InheritTagger(mode=sim.MATERNAL, infoFields=['native_breeding_ground']),
                                  sim.PedigreeTagger(),
                                  sim.PyOperator((report)) #commented it out, but
                                  #useful for debugging mother_id and father_id
                                  ],
                             subPops=[(sub_population, 7) for sub_population in range(0, sub_population_count)],
                             weight=0 #recommended by Bo:
                                 #The second mating scheme should have weight 0, 
                                 #and generate the rest of the offspring. If you get 
                                 #an error, it probably means the parental virtual 
                                 #subpopulation is empty.
                             ),
        subPopSize=demo),
           ##REMOVING TRAJECTORY, readd here if needed        
        # sim.ControlledRandomMating(subPopSize=model10,
        #                           freqFunc=traj.func(),
         #                          weight=1)] #MK: we decided to keep the same weight as the
        #mitochondrial transmitter.
postOps = [

# Determine the isotopic ratios in individuals
sim.PyOperator(func=postop_processing),
sim.Migrator(mode=sim.BY_IND_INFO),
    # count the individuals in each virtual subpopulation
    #sim.Stat(popSize=True, subPops=[(0,0), (0,1), (0,2), (1,0), (1, 1), (1, 2)]),
    # print virtual subpopulation sizes (there is no individual with age > maxAge after mating)
    #sim.PyEval(r"'Size of age groups: %s\n' % (','.join(['%d' % x for x in subPopSize]))")

    # Alternatively, calculate the Fst
    # FIXME: How does this actually work? Does it work for > 2 populations? I don't really understand it yet
    # ELC: it is a calculation that partitions variance among and between populations, and can be calculated as a 
    # global statistic or on a pairwise basis. We use it as an indication of genetic differentiation.

    sim.Stat(structure=range(1), subPops=sub_population_names, suffix='_AB', step=10),
    sim.Stat(numOfMales=True, 
             begin = 73, step = 1),     
    sim.PyEval(r"'Fst=%.3f \n' % (F_st_AB)", step=10), #Print Fst every 10 steps
    sim.Stat(alleleFreq=[1, 2, 3, 4, 5, 6, 7, 8, 9, 100], vars=['alleleFreq_sp'], step=10), #added this now, to
    #calculate the allele frequencies in selected loci
   sim.PyEval(r"'%.2f\t%.2f\t%.2f\t%.2f\t%.2f\t%.2f\t%.2f\t%.2f\t%.2f\t%.2f\t%.2f\t%.2f\t%.2f\t%.2f\t%.2f\t%.2f\t%.2f\t%.2f\t%.2f\t%.2f\n' % ("
       "subPop[0]['alleleFreq'][1][1], subPop[0]['alleleFreq'][2][1], subPop[0]['alleleFreq'][3][1],"
       "subPop[0]['alleleFreq'][4][1], subPop[0]['alleleFreq'][5][1], subPop[0]['alleleFreq'][6][1],"
       "subPop[0]['alleleFreq'][7][1], subPop[0]['alleleFreq'][8][1], subPop[0]['alleleFreq'][9][1],"
       "subPop[0]['alleleFreq'][100][1], subPop[1]['alleleFreq'][1][1], subPop[1]['alleleFreq'][2][1],"
       "subPop[1]['alleleFreq'][3][1], subPop[1]['alleleFreq'][4][1], subPop[1]['alleleFreq'][5][1],"
       "subPop[1]['alleleFreq'][6][1], subPop[1]['alleleFreq'][7][1], subPop[1]['alleleFreq'][8][1],"
       "subPop[1]['alleleFreq'][9][1], subPop[1]['alleleFreq'][100][1])", step=1, begin = 73)
],
finalOps= sim.Stat(alleleFreq=0, vars=['alleleFreq_sp']),
gen = 83
)

`

The error message I am getting is:


    matingScheme = sim.ConditionalMating('popSize > demo(gen)',

TypeError: __init__() got an unexpected keyword argument 'subPopSize'```

Thank you very much, once again.
BoPeng commented 10 months ago

Thank you heaps again for creating the patch. I haven't been able to successfully install it unfortunately - something to do with a zlib.h, which was a problem when I first installed simuPOP but I cannot recall how I fixed it the first time around. So, I am going through the roundabout way you suggested.

Please create a separate ticket with versions of OS, Python etc and the error messages. I suspect that you forgot to install packages like zlib1g-devel

BoPeng commented 10 months ago

TypeError: init() got an unexpected keyword argument 'subPopSize'```

I think ConditionalMating just defers the mating scheme to one of the two mating schemes, so subPopSize should be specified for these mating schemes.

BoPeng commented 10 months ago

popSize > demo(gen)

The expression is likely wrong since demo(gen) returns a list, and popSize is an integer. I recommend that you use things like

pop.dvars().demo = demo
sim.stat(pop, popSize=True)
sim.pyEval(pop, 'you expression')

to find the correct syntax to use.

marina-klemm commented 9 months ago

Hi Bo, I have installed a Linux VM into my Windows machine and I got everything up and running again. I installed the new patch of simuPOP that allows the offspring to have a smaller population than the parental generation, but I am still getting the same error. Is there any way to check if the new version is indeed installed, other than checking the mating.cpp file?

Thank you very much, once again. Have a great new year! Marina

mtdna_1.txt snp_1.txt whale_patch.txt Screenshot from 2024-01-21 16-26-02

Versions:

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
Type "copyright", "credits" or "license" for more information.

IPython 8.20.0 -- An enhanced Interactive Python.

import simuPOP as sim
simuPOP Version 1.1.12 : Copyright (c) 2004-2016 Bo Peng
Revision 4630 (Jan 14 2024) for Python 3.10.12 (64bit, 1thread)
Random Number Generator is set to mt19937 with random seed 0x98336dbfe6f9fcde.
This is the standard short allele version with 256 maximum allelic states.
For more information, please visit http://simupop.sourceforge.net/,
or email [simupop-list@lists.sourceforge.net](mailto:simupop-list@lists.sourceforge.net) (subscription required).

Error message:

Traceback (most recent call last):

  Cell In[92], line 1
    pop.evolve(

  File ~/spyder-env/lib/python3.10/site-packages/simuPOP/__init__.py:434 in evolve_pop
    gen = simu.evolve(initOps, preOps, matingScheme, postOps, finalOps, gen)

ValueError: mating.cpp: 1882 Mating scheme with a negative weight of 1 would like to produce 14745 offspring, but there are only 1500 unclaimed offspring left.
BoPeng commented 9 months ago

Line 1882 of the source code no longer has that error message so you must have been running the older version. To verify, use

import simuPOP
simuPOP.__file__

to see what version of simuPOP is imported. Then, you can remove the directory shown and re-install simuPOP from source.

marina-klemm commented 9 months ago

Hi Bo, I had done just that and I see that the latest version is installed, which I believe is Version 1.1.12, Revision 4630. Is that right?

Thanks

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
Type "copyright", "credits" or "license" for more information.

IPython 8.20.0 -- An enhanced Interactive Python.

runcell(0, '/home/mklemm/.config/spyder-py3/temp.py')
simuPOP Version 1.1.12 : Copyright (c) 2004-2016 Bo Peng
Revision 4630 (Jan 14 2024) for Python 3.10.12 (64bit, 1thread)
Random Number Generator is set to mt19937 with random seed 0x708448995227a3f7.
This is the standard short allele version with 256 maximum allelic states.
For more information, please visit http://simupop.sourceforge.net,
or email simupop-list@lists.sourceforge.net (subscription required).`
BoPeng commented 9 months ago

Is the problem still there? The patch does not change the version number so I really cannot tell.

marina-klemm commented 9 months ago

I just deleted the folder and installed it again (inside my virtual environment) using:

source spyder-env/bin/activate
sudo apt install git
git clone https://github.com/BoPeng/simuPOP.git
cd simuPOP
curl -v -O  https://github.com/BoPeng/simuPOP/pull/115/commits/0a59d2b8b327ebc3f01a0d012229523c10f2ee9d.patch
patch -p1 < 0a59d2b8b327ebc3f01a0d012229523c10f2ee9d.patch

The error is still the same:

Traceback (most recent call last):

  File ~/spyder-env/lib/python3.10/site-packages/spyder_kernels/py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File ~/Downloads/whale_patch.py:474
    pop.evolve(

  File ~/spyder-env/lib/python3.10/site-packages/simuPOP-1.1.12-py3.10-linux-x86_64.egg/simuPOP/__init__.py:434 in evolve_pop
    gen = simu.evolve(initOps, preOps, matingScheme, postOps, finalOps, gen)

ValueError: mating.cpp: 1882 Mating scheme with a negative weight of 1 would like to produce 14673 offspring, but there are only 1500 unclaimed offspring left.

Version and Revision:

runcell(0, '/home/mklemm/.config/spyder-py3/temp.py')
simuPOP Version 1.1.12 : Copyright (c) 2004-2016 Bo Peng
Revision 4630 (Jan 28 2024) for Python 3.10.12 (64bit, 1thread)
Random Number Generator is set to mt19937 with random seed 0xb8e1db495b715b09.
This is the standard short allele version with 256 maximum allelic states.
For more information, please visit http://simupop.sourceforge.net,
or email simupop-list@lists.sourceforge.net (subscription required).
Out[1]: '/home/mklemm/spyder-env/lib/python3.10/site-packages/simuPOP-1.1.12-py3.10-linux-x86_64.egg/simuPOP/__init__.py'

I don't know where exactly I went wrong in this. Thank you very much for your help!

marina-klemm commented 9 months ago

I think I found some source of error here:

source spyder-env/bin/activate
cd spyder-env/
git clone https://github.com/BoPeng/simuPOP.git
cd simuPOP
wget --output-document=issue114.patch https://github.com/BoPeng/simuPOP/pull/115/commits/0a59d2b8b327ebc3f01a0d012229523c10f2ee9d.patch
git apply -v  issue114.patch
Checking patch src/mating.cpp...
error: while searching for:
        {
            if (fcmp_gt(w_neg[i], 0.))
            {
                vspSize[i] = static_cast<ULONG>(parentSize[i] * w_neg[i]);
                DBG_ASSERT(all >= vspSize[i], ValueError,
                           (boost::format("Mating scheme with a negative weight of %1% would like to produce %2%"
                                          " offspring, but there are only %3% unclaimed offspring left.") %
                            w_neg[i] % vspSize[i] % all)
                               .str());
                all -= vspSize[i];
            }
        }

error: patch failed: src/mating.cpp:1878
error: src/mating.cpp: patch does not apply
BoPeng commented 9 months ago

Why do you need to manually apply the patch? You can just checkout the right branch.

marina-klemm commented 9 months ago

Like this? I will try to reset it and then switch to branch #115.

cd simuPOP
git reset --hard HEAD
git fetch -v origin pull/115/head:pr-115
git checkout pr-115
python setup.py install
marina-klemm commented 9 months ago

That was it! I needed to checkout the branch and not apply the patch itself, indeed. Thank you so much for your help! All the best!

BoPeng commented 9 months ago

Glad that it works. I have merged the PR and will make a minor release, although help may be needed for conda release to work.