Regarding ModelTest reproducibility

It would be desirable that given the same input data, ModelTest should always produce the same results regardless the selected parallelization approach. Not only the best-fit model selection, but also the same likelihood scores and therefore model/partitioning selection scores.

However, using different starting parameters (model parameters + topology + branch lengths) to the optimization of each candidate substitution model has a significant effect on the execution time, but can also affect the final log-likelihood, parameter values and thus the selection process itself. One can expect, for example, the alpha shape parameter to be similar between +G models, or the proportion of invariant sites between +I models. In order to accelerate the model parameter optimization, keeping the optimal parameter values from one model to another seems to be a good approach (to the extent possible, given the model constraints).

Nevertheless, when using different high level parallelization schemes (task level) the sequence in which the models are optimized might be different and also the starting parameters. The problem arises when the results of the optimization differs according to the starting parameter values, and not only the execution times. There are several possible solutions:

Use fixed starting parameters (e.g., alpha = 0.5, pinv = 0.0, substitution rates = {1.0, 1.0, 1.0, 1.0, 1.0, 1.0}). Reproducibility is guaranteed but slows down the optimization process considerably.
Take one single model parameters as starting point for the rest (e.g., GTR+G, JTT+G). Reproducibility is guaranteed but delays the task level parallel execution until the first model is optimized.
Somehow do a quick and good estimation of the starting model parameters. Is this possible for every parameter?
Keep the parameters between models for each execution thread (by thread here I mean a thread or a process that sequentially optimizes a subset of candidate models). Speeds up the process, but the results might not be exactly reproducible if more than one single thread is used (because of the dynamic distribution of models to the threads).

4b. If the candidate models are distributed statically among threads, the results will be reproducible as long as the same number of execution threads are used, but this can increase the workload imbalance.

ddarriba / modeltest

Regarding ModelTest reproducibility #7