Open ashander opened 6 years ago
To answer question in https://github.com/ashander/ftprime/pull/48#issuecomment-340484513 @hyanwong it'd be nice to have examples that are relatively simple to read. I realize that your example grew out of an ongoing project though.
The current state is great (thanks for finding lots of bugs/issues!) but adding a lot more code complexity to the PR /branch would make things hard to understand as an ftprime example.
Yes, I agree about making them easy to read / simpler & not relying on external code. I feel a bit guilty about bloating the code. However, I do think it is worth having examples that can be called either as a python routine in their own right or as a main()
routine using argparse. This needn't add much to the complexity if it is done right, I suspect. FWIW, I think I've done it in too hacky a manner in selective_sweep.py
. Also, as an aside, I think it's also worth using import logging
and logging.basicConfig(...)
rather than your own code for log files (easier to turn this on and off)
Cool. Those are good comments and I agree separating code from main() is nice style. It's all a balancing act.
Code to allow SFS-like testing of example files is at https://github.com/hyanwong/ftprime/commit/40d01ae84484dc894e4508de2e8d43759328ce64
Since it updates the SFS incrementally, you will need to pass it a function to calculate various statistics iteratively. I use a hacked together script which could probably do with some tidying, something like (untested)
from SFS import incrementalSFS
def H(n):
"""Returns an approximate value of n-th harmonic number.
http://en.wikipedia.org/wiki/Harmonic_number
"""
# Euler-Mascheroni constant
gamma = 0.57721566490153286060651209008240243104215933593992
return gamma + math.log(n) + 0.5/n - 1./(12*n**2) + 1./(120*n**4)
def Theta_W(sfs, n):
"""
Calculate the (mutationless) Watterson estimator
"""
return sum(sfs.values()) / H(n)
def Theta_pi(sfs, n):
"""
Calculate the (mutationless) nucleotide diversity
2 sum(i(n−i)*ξi) / n / (n-1)
"""
return 2 * sum([i*(n-i)*sfs[i] for i in sfs.keys()]) / n / (n-1)
def Theta_H(sfs, n):
"""
Calculate the theta estimate used in Fay & Wu's H
2 * sum(i^2 ξi) / n / (n-1)
"""
return 2 * sum([(i**2) * sfs[i] for i in sfs.keys()]) / n / (n-1)
def save_estimates(start, end, sfs, n):
watt = Theta_W(sfs, n)
ndiv = Theta_pi(sfs, n)
tH = Theta_H(sfs, n)
T_W.append((watt, end-start))
T_pi.append((ndiv, end-start))
T_H.append((tH, end-start))
D.append(((end+start)/2, ndiv - watt))
FayWuH.append(((end+start)/2, ndiv - tH))
ts = msprime.load("sweepfileXXX.hdf5")
#globals (yuck)
T_W, T_pi, T_H, D, FayWuH = [], [], [], [], []
incrementalSFS(ts, save_estimates)
#plot D here using something like
#import matplotlib
#plt.plot(*zip(*D))
#plt.set_xlim(0,ts.get_sequence_length())
Ideally we'll have a few stand-alone example scripts (eg as contributed by @hyanwong in #48 ) but these will: