Question about the simulations

Hi Sarah, I'm glad you have found it useful for your work. I have some answers to your questions below:

Is this the time for one replicate, or the full 100 replicates? In general, the simulations should be a little faster than the empirical fitting. However, it is relative and more complex models will always take longer to fit. I would consider how the simulations compare to the empirical optimization routine first (in terms of time to completion), and if it is much longer there may be something else going on.
You could manually change the starting number in the Optimize_Function_GOF.py script if you want to pick up right where you left off. For your specific example after running 30, leave the number of sims at 100 and just change the starting point in line 359:

for i in range(1,(sims+1)):

for i in range(31,(sims+1)):

and the analysis will pick up at 31, and continue to 100. However, it is likely an additional header line will be inserted into the output summary file (Simulation_Results.txt) and you will need to remove that before using the R plotting script.

If simulations are not running fast enough you could also run four instances of the script in different directories (e.g., with 25 simulations each), then combine the output files manually. I don't think this would interfere with the R plotting, but you could always manually renumber the simulations in the output file if it does. Because dadi does not use multithreading, the fastest way to speed up the simulations is to break them into smaller jobs and use all the cores you have available. Make sense?

dportik / dadi_pipeline

Question about the simulations #10