BoPeng / simuPOP

A general-purpose forward-time population genetics simulation environment.
http://bopeng.github.io/simuPOP/
GNU General Public License v2.0
29 stars 12 forks source link

Extracting statistics from individual subpopulations #106

Closed cruzan-lab closed 12 months ago

cruzan-lab commented 1 year ago

I'm having trouble finding an example of methods to extract stats like allele frequencies from individual subpopulations. Using:

[simu.dvars(subPop=[1]).alleleFreq[0][1]]

Generates an error saying that subPop or subPops is not a valid parameter. Putting sim.Stat(subPops=[1], alleleFreq=0) in postOps seems to work fine. If someone could suggest the syntax to make this work, that would be great.

BoPeng commented 1 year ago

Please clarify the context that you are using these expressions. By simu you appears to be using Simulator so you need an index to individual population. If you need more example, go to http://bopeng.github.io/simuPOP/search.html?q=dvars&check_keywords=yes&area=default and search dvars.

cruzan-lab commented 12 months ago

Yes, it's using

simu = sim.Simulator(sim.Population(size=[popsize, popsize], loci=1, infoFields=['fitness', 'migrate_to']), rep=Nreps)

and then I'm trying to extract individual subpopulation stats.

allelefreq0 = [simu.dvars(x).alleleFreq[0][0] for x in range(Nreps)]

I get a list of allele frequencies averaged across the two subpopulations for each rep.

I have searched the documentation and have yet to find a way to parse allele frequencies for each subpopulation within each rep.

bioworkflows commented 12 months ago

What Stat Operator did you use? I think you may not have calculated subpopulation specific allele frequency in the first place. Please check the example in http://bopeng.github.io/simuPOP/userGuide_ch5_sec11.html#allele-count-and-frequency and see how subpopulation specific allele frequencies are calculated, and accessed (use subPop to get statistics for particular subpopulation).

cruzan-lab commented 12 months ago

in subpop 0, the frequency of allele 1 is set by the variable initfreq ( > 0)

the frequency of allele 1 is 0 in subpop 1 prior to gene flow.

initOps=[ sim.InitSex(maleProp=0.5),
sim.InitGenotype(freq=[(1 - initfreq), initfreq], subPops=0),
sim.InitGenotype(freq=[1, 0], subPops=1)]

bioworkflows commented 12 months ago

Allele frequencies and other statistics need to be explicitly calculated by Stat Before they can be accessed. Please see relevant documentation for details http://bopeng.github.io/simuPOP/userGuide_ch5_sec11.html#how-statistics-calculation-works

cruzan-lab commented 12 months ago

If I set the frequency of allele frequency to 1 in subpop 0 and to 0 in subpop 1 I get the average of the two subpops (i.e. 0.5) using this: allelefreq1 = [simu.dvars(x).alleleFreq[0][1] for x in range(Nreps)]

bioworkflows commented 12 months ago

Please read the documentation. If you want allele frequency from subpopulation 0, you are supposed to do something like

simu.evolve(
….
    Stat(alleleFreq=[0], subPops=[0, 1])
…
)

Then access the variables, if from outside of the evolve function

simu.dvars(x).subPop[0][‘alleleFreq’][0][1] for x in range(Nreps)
cruzan-lab commented 12 months ago

I pasted that in: allelefreq1 = simu.dvars(x).subPop[0][‘alleleFreq’][0][1] for x in range(Nreps) It gives me a syntax error at 'for'

bioworkflows commented 12 months ago

These are just Python dictionaries, you can just

print(simu.dvars(x))

to see the structure of the data.

cruzan-lab commented 12 months ago

I put the stat command (Stat(alleleFreq=[0], subPops=[0, 1])) in initOps. Is that correct?

BoPeng commented 12 months ago

Depends on when you want the statistics to be calculated. Operators in initOps and finalOps will be called once, and at each generation (also see parameters at, step etc) if in ops.

cruzan-lab commented 12 months ago

When I use print(simu.dvars(x)) I get information from the last rep, and it is not parsed by subpopulation.

bioworkflows commented 12 months ago

Exactly, if you have Stat In finalOps, it will calculate allele frequency at the end of simulation. Make sure you have subPops parameter in that operator. If it is troublesome for you to deal with operator, just do something like

for pop in simu.populations():
     stat(pop, …)
     print(pop.dvars())
cruzan-lab commented 12 months ago

When I do this outside of the simulation: for pop in simu.populations(): sim.Stat(pop, alleleFreq=[0], subPops=[0, 1]) print(pop.dvars()) I get dvars for each rep, and the allele frequencies are averaged across subpopulations.

I would like to get individual subpopulation allele frequencies into variables for writing to a file. I'm setting this up for a class I am teaching and want the students to be able to track allele frequencies in subpop 1 with migration and selection.

The main problem is that this command gives me a syntax error: allelefreq1 = simu.dvars(x).subPop[0][‘alleleFreq’][0][1] for x in range(Nreps)

bioworkflows commented 12 months ago

Stat and stat are different. The former is an Operator that needs to be called. The latter is a function that applies to its first parameter. Your sim.Stat did not calculate anything.

cruzan-lab commented 12 months ago

ok. I put sim.stat(pop, alleleFreq=[0], subPops=[0, 1]) in finalOps and I still get the same syntax error for simu.dvars

BoPeng commented 12 months ago

sim.stat is a function and should not be put in ops. Please read the relevant sections of the documentation instead of aimlessly trying different things.