dportik / dadi_pipeline

An accessible and flexible tool for fitting demographic models with dadi using custom or published models (available here), conducting goodness of fit tests, and plotting.
GNU Lesser General Public License v3.0
63 stars 30 forks source link

pop1 or pop2 #21

Closed cistarsa closed 1 year ago

cistarsa commented 2 years ago

Hello and thank you for developing this useful program! I'm determining demographic history between two divergent populations by using stairway plot with their respective SFS as well as a joint 2DSFS. Both sets of SFS were generated in ANGSD and realSFS command. Anyway, after modeling no_mig and no_mig_size in dadi, favoring no_mig, I've calculated Ne1 and Ne2, which are very similar to those estimated by stairway except they're flipped. As in the estimate from dadi for pop1 is much closer to that of pop2 from stairway. Furthermore, the difference between pop1 and pop2 is around 4 fold. I'm having trouble diagnosing how this "flip" occurred. Here is the joint SFS and the realSFS command used to generate. Thank you!

./angsd/misc/realSFS pop1_sfs.saf.idx pop2.saf.idx -P 24 -fold 1 >> 2d_Aug25.sfs

joint sfs

1992768.151735 49159.185835 13568.706837 5144.128084 1913.596299 999.519330 205.482639 317.066135 20.735284 37.404963 69.785701 34.876663 0.057500 15.633086 28.633686 0.000000 0.000000 0.000000 0.000000 0.000000 6.317693 7.210363 0.000000 0.000000 0.000000 0.000000 38878.062382 20352.036277 9480.901455 4017.042444 2384.410338 395.734227 787.859164 136.130698 181.978437 103.240704 66.932155 0.000000 0.000000 3.707234 19.490405 0.000000 2.666079 6.410877 0.000004 0.000000 0.000000 0.000000 0.000000 0.016741 0.197921 0.000007 13176.123802 10293.696090 6750.689010 4360.456236 1260.212058 1934.486998 325.789728 393.768625 2.452084 0.232651 13.352794 85.345057 0.000000 0.160121 29.779686 2.186559 36.045907 0.000000 0.000000 0.000000 0.000000 0.000000 0.051804 4.477613 0.000000 0.000000 4733.461799 4144.621370 4835.960650 2195.747697 2574.639735 445.190697 550.637866 587.236537 525.537440 302.970953 11.219059 15.770535 35.735601 22.092268 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2140.306066 2668.668947 1674.437278 3372.508653 1197.347168 1643.737942 1723.931919 481.344369 190.733005 51.974370 78.733445 9.665332 47.180923 1.614863 0.003064 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 832.276756 1063.934277 1986.509981 88.099352 2156.664247 648.125382 531.768348 240.451802 391.700429 424.623079 85.777707 63.141629 194.847438 167.000549 0.009280 0.037480 18.773547 0.000000 0.000000 0.000000 0.000000 0.000954 7.968417 0.000000 0.000000 0.000000 739.604460 629.468934 423.212702 1776.325351 155.096740 517.812247 369.436148 770.468255 81.238149 222.710023 391.115134 33.517590 167.502818 24.537579 7.698388 99.847062 4.316497 0.000000 10.966750 0.000000 0.000000 10.217667 4.406112 0.000000 0.000000 0.000000 37.974083 148.534267 1002.414782 394.728825 600.268579 898.312924 564.789979 157.242107 1083.413795 6.000964 531.881782 15.462590 11.609458 0.248786 25.616397 100.979360 0.572815 0.000000 0.311193 0.000000 0.000000 0.499740 0.012581 0.000000 0.000000 0.000000 65.583895 356.504100 48.435549 100.705813 710.264225 582.189103 466.975016 317.604143 137.877517 146.310430 60.815252 65.132446 39.219711 0.964052 41.290089 12.159369 0.890080 0.000000 0.000008 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 280.351937 76.053207 14.792154 690.783181 160.884652 400.878739 719.555662 542.700832 265.625716 569.583987 367.542732 56.303517 309.836651 7.157761 132.896794 20.242474 74.024130 0.000000 0.000104 13.806615 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 26.166161 0.084342 78.630046 83.936783 148.797783 59.864401 41.395521 245.395559 218.331637 280.571417 245.956991 212.876902 56.000214 256.813893 6.314497 21.573326 27.683155 0.002035 0.048725 49.278511 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.209926 0.008622 103.470051 28.412793 55.447948 189.654951 190.070071 34.126611 529.432590 433.663302 163.452780 127.750070 42.452156 180.250336 14.067964 130.855168 4.547521 0.127857 0.393179 0.308944 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 33.892208 0.529200 86.555434 94.957193 13.206530 244.676654 138.749181 103.011653 41.957844 125.369902 110.495783 149.753720 95.892346 78.798231 40.210882 152.607802 22.027314 0.567482 120.548154 0.501512 0.015848 0.000000 0.000000 0.000000 0.000000 0.000000 57.504125 55.802606 3.514069 6.097744 1.803927 242.849581 135.744904 366.046997 21.047931 33.703644 299.428843 381.726250 402.853409 151.273238 60.314593 129.770789 82.173396 57.089250 74.780582 0.006226 33.286141 0.057329 0.665927 0.000000 0.000000 0.000000 16.770003 22.760802 0.028306 8.965116 16.914629 89.614913 4.515001 26.291564 153.070598 361.663293 83.250935 106.498500 62.599476 262.648926 96.730943 28.560368 131.546492 38.824921 38.627045 7.889028 21.472512 59.431930 24.514336 0.148538 0.000010 0.000000 0.433045 0.009227 0.000842 57.404641 20.106025 1.485078 181.654651 99.985972 111.267726 58.919322 22.685005 39.551974 31.813405 201.139792 158.896916 105.849952 87.844511 138.254852 2.500197 29.970280 0.554229 0.071983 0.008672 22.303755 3.940354 0.000001 0.002929 0.000001 0.008032 74.444486 40.691182 0.001031 4.265159 12.311178 58.711206 82.566326 81.775446 83.319485 192.032526 112.411820 202.742810 27.777605 19.767527 36.328568 0.234738 75.123197 0.917092 0.000000 0.000000 1.869007 4.079902 0.000004 0.001853 0.000000 0.383542 0.391963 5.563357 0.000011 0.051695 0.348889 25.927175 9.254005 194.639757 109.704397 107.209612 37.574982 105.507996 41.590609 86.725656 71.394120 1.124972 106.548455 1.463686 0.000000 0.010401 0.132531 2.893621 0.000001 7.488652 0.000000 7.262298 0.000001 0.097736 0.000830 0.004157 4.071843 65.346136 1.877036 95.328365 69.697578 217.000244 25.466528 238.627492 25.210326 131.334381 230.132121 81.965808 106.780689 38.712011 0.334674 6.139749 1.965220 8.406371 0.000000 4.631695 2.379329 2.350995 0.000000 0.000000 29.168283 0.146997 3.583515 185.820498 0.397605 96.510176 35.340741 146.424199 3.979446 82.389971 118.410992 69.127401 43.019561 408.457062 52.627223 9.742085 96.259896 50.430773 16.201770 3.599595 0.000000 0.000000 7.468630 1.598952 0.000000 0.000000 18.062332 3.631159 0.114993 13.333394 0.060626 57.234638 21.589903 40.200792 3.680328 87.093076 41.306965 46.036296 82.799656 63.496949 119.192167 0.517299 42.741642 39.162238 56.844064 4.213102 0.000000 0.000000 0.000000 0.172942 0.000000 0.000000 0.000000 14.641175 0.000017 0.000639 0.019176 5.902029 0.614084 13.254259 51.035969 154.759617 105.433197 202.657164 10.082911 13.230453 6.421261 0.104979 63.612889 6.363969 46.137592 1.614277 0.000000 0.000000 0.000000 0.000070 0.000000 0.000000 0.000000 0.019008 0.000000 0.000000 0.318192 1.471147 3.367861 9.863580 40.276493 128.573828 65.340182 158.248501 58.799708 27.297875 155.084749 190.553413 82.754570 39.763432 131.167471 4.831462 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.004363 0.000000 12.232847 4.454641 9.646836 13.346731 0.220822 34.584083 21.834435 9.857928 164.860644 64.441342 177.068152 209.549961 292.830862 112.898308 95.391257 75.596053 0.000000 0.000001 0.000001 0.000000 0.000000 0.000000 0.000000 0.000000 36.155355 0.031164 27.017660 3.323254 41.169045 20.031561 0.020472 0.267904 6.467312 3.328968 55.180750 49.728680 244.161912 13.227688 173.461895 40.135761 16.678386 14.294364 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.079693 0.004631 0.000010 0.439000 24.163084 7.763451 0.004276 1.914963 18.786225 0.290164 1.088806 40.700870 59.991381 0.187427 75.931245 141.797836 9.513128 0.806444 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 6.933836 8.387877 0.025039 0.341367 12.786395 1.608408 11.514196 16.790264 122.690641 0.645593 55.723883 48.229917 56.197295 3.491971 0.000005 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.003778 14.347139 0.222802 29.294626 32.750169 10.553146 37.563229 23.807834 24.586892 34.740750 5.138429 135.442734 344.337475 104.496460 0.034712 0.000000 0.000000 0.000001 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000004 22.737971 1.501783 59.774668 18.434579 1.395610 149.582179 2.282020 9.063602 134.657855 164.046699 304.018879 450.830369 720.402890 1731.442683 

dadi_pipeline.py


#===========================================================================
# Import data to create joint-site frequency spectrum
#===========================================================================

#**************

#Create python dictionary from snps file
#dd = dadi.Misc.make_data_dict(snps)
dadi.Misc.make_data_dict_vcf("/media/molecularecology/Summit/CHTC/staging/Modern_Museum/fastqs/beagle/West_EAST_624_impute.vcf.vcf.gz", "/media/molecularecology/Summit/88_fastq/cleaned/AUG_GAMS/easySFS/easyp_pop2")
#dd = dadi.Misc.make_data_dict(fs)
fs = dadi.Spectrum.from_file("/media/Summit/CHTC/staging/Modern_Museum/fastqs/2d_Aug25.sfs")#, ['West', 'East'], projections = [30, 25], polarized = False)

#**************
#pop_ids is a list which should match the populations headers of your SNPs file columns
pop_ids=["West", "East"]

#**************
#projection sizes, in ALLELES not individuals should be 56, 50
proj = [30, 25]

#Convert this dictionary into folded AFS object
#[polarized = False] creates folded spectrum object
#fs = dadi.Spectrum.from_data_dict(dd, pop_ids=pop_ids, projections = proj, polarized = False)

#print some useful information about the afs or jsfs
print("\n\n============================================================================")
print("\nData for site frequency spectrum:\n")
print("Projection: {}".format(proj))
print("Sample sizes: {}".format(fs.sample_sizes))
print("Sum of SFS: {}".format(numpy.around(fs.S(), 2)))
print("\n============================================================================\n")

#================================================================================
# Calling external 2D models from the Models_2D.py script
#================================================================================
'''
 We will use a function from the Optimize_Functions.py script for our optimization routines:

 Optimize_Routine(fs, pts, outfile, model_name, func, rounds, param_number, fs_folded=True, 
                        reps=None, maxiters=None, folds=None, in_params=None, 
                        in_upper=None, in_lower=None, param_labels=None, optimizer="log_fmin")

   Mandatory Arguments =
    fs:  spectrum object name
    pts: grid size for extrapolation, list of three values
    outfile:  prefix for output naming
    model_name: a label to help label the output files; ex. "no_mig"
    func: access the model function from within 'moments_Run_Optimizations.py' or from a separate python model script, 
          ex. after importing Models_2D, calling Models_2D.no_mig
    rounds: number of optimization rounds to perform
    param_number: number of parameters in the model selected (can count in params line for the model)
    fs_folded: A Boolean value (True or False) indicating whether the empirical fs is folded (True) or not (False).

   Optional Arguments =
     reps: a list of integers controlling the number of replicates in each of the optimization rounds
     maxiters: a list of integers controlling the maxiter argument in each of the optimization rounds
     folds: a list of integers controlling the fold argument when perturbing input parameter values
     in_params: a list of parameter values 
     in_upper: a list of upper bound values
     in_lower: a list of lower bound values
     param_labels: list of labels for parameters that will be written to the output file to keep track of their order
     optimizer: a string, to select the optimizer. Choices include: "log" (BFGS method), "log_lbfgsb" (L-BFGS-B method), 
                "log_fmin" (Nelder-Mead method), and "log_powell" (Powell's method).

Below, I give all the necessary information to call each model available in the
Models_2D.py script. I have set the optimization routine to be the same for each
model using the optional lists below, which are included as optional arguments for
each model. This particular configuration will run 4 rounds as follows:
Round1 - 10 replicates, maxiter = 3, fold = 3
Round2 - 20 replicates, maxiter = 5, fold = 2
Round3 - 30 replicates, maxiter = 10, fold = 2
Round4 - 40 replicates, maxiter = 15, fold = 1

If this script was run as is, each model would be called and optimized sequentially;
this could take a very long time. For your actual analyses, I strongly recommend
creating multiple scripts with only a few models each and running them
independently. It is also not a good idea to mix models from the Diversification Set
and the Island Set, as each was meant to be mutually exclusive.

'''

#create a prefix based on the population names to label the output files
#ex. Pop1_Pop2
prefix = "825".join(pop_ids)

#**************
#make sure to define your extrapolation grid size (based on your projections)
pts = [200,300,400]

#**************
#Set the number of rounds here
rounds = 4

#define the lists for optional arguments
#you can change these to alter the settings of the optimization routine
reps = [10,20,30,40]
maxiters = [3,5,10,15]
folds = [3,2,2,1]

#**************
#Indicate whether your frequency spectrum object is folded (True) or unfolded (False)
fs_folded = True

'''
Diversification Model Set

This first set of models come from the following publication:

    Portik, D.M., Leache, A.D., Rivera, D., Blackburn, D.C., Rodel, M.-O.,
    Barej, M.F., Hirschfeld, M., Burger, M., and M.K.Fujita. 2017.
    Evaluating mechanisms of diversification in a Guineo-Congolian forest
    frog using demographic model selection. Molecular Ecology 26: 5245-5263.
    doi: 10.1111/mec.14266

'''

# Split into two populations, no migration.
Optimize_Functions.Optimize_Routine(fs, pts, prefix, "no_mig", Models_2D.no_mig, rounds, 3, fs_folded=fs_folded,
                                        reps=reps, maxiters=maxiters, folds=folds, param_labels = "NY1, WI2, T")

...etc.
dportik commented 1 year ago

Hello @cistarsa, Have you tried running a 1D analysis on each population in dadi?

This could help determine if: 1) the stairway plots are giving different results due to a different underlying algorithm 2) there is actually a flip occurring in the populations in the 2D models

I don't have any experience working with ANGSD or realSFS, so can't comment on your commands used.

dportik commented 1 year ago

No response from OP.