Closed Steboss closed 6 years ago
This is true; perhaps it's best addressed directly. How did we end up here? Was it that we ran with what we thought were "best practices" protocols for each code and attempted to resolve differences later?
It may be worth addressing with two prongs: 1) Adding something in the paper explaining how it would be worth digging into some of these differences in future work using the same inputs (??) 2) Admitting honestly to the reviewer that we had done a ton of calculations already before realizing protocols were so different, and that if we were designing it from scratch we might do it differently, but we lack the resources to repeat at this point.
We have decided to use TI and depending on softcore potentials and the way bonded scaling is implemented the number of lambdas may have to be adjusted because of the steepness of the dV/dl curve. Indeed, I had to "manually" adjust this for AMBER (although only through visual inspection and not some rigorous approach to control integration of the dV/dl curve). Maybe we just show a few such curves for the various software packages in the SI to demonstrate how different they can be?
The simulation times may need to be adjusted too. But we don't show any of that. In fact, we do not demonstrate how lambda and simulation times would need to be adjusted to reach convergence.
Nevertheless, I do not think that the same set of parameters will mean the same level of convergence, efficiency, etc.
Also, cutoffs are implemented in very different ways: shifted, scaled, hard, etc. That means at least numerical differences and it is not hard to see why setting cutoffs in all codes to exactly the same numerical value does not automatically translate into an "equivalent" protocol.
My argument is this: at the end we wanted to show that it is possible to converge all codes to the "same" free energy. That did work (to some degree). The exact protocol is only of secondary nature or "the end justifies the means". We have not guarantee that "protocol equivalence" does any better and even if this is desired at all.
I think showing some curves (that are different) for different software packages may be a very reasonable way to demonstrate that this was needed.
My argument is this: at the end we wanted to show that it is possible to converge all codes to the "same" free energy. That did work (to some degree). The exact protocol is only of secondary nature or "the end justifies the means". We have not guarantee that "protocol equivalence" does any better and even if this is desired at all.
Can you help with some language on this for the paper itself?
Ok, let's collect some curves then. Maybe everyone puts a few samples online and we then choose those that look the most different?
Yes, I can help with this.
I have added some additional text just in the introductory part of Methods. Now I would need a few TI curves in the SI to support this.
@halx what TI curves do you want to compare between codes? I foresee difficulties since we have a mix of unified and split protocols for the relative calculations. Should we check first the absolutes ? The gradients in the LJ decoupling step should be the most sensitive to softcore parameters.
Essentially, whatever proves the point best :-) I would think too that LJ curves may work best. We can do absolute ones too as the unified protocol works for all codes.
The general design of the experiments is very strange. The authors spend very much time to discuss small technical differences between the various software. However, the most important parameters affecting the results and are trivial to control are not set equal, e.g. number of lambda values, length of simulations and cutoff distance. This is a total mystery to me. Can you really expect that calculations using a 8 or 12 A cutoff give results that are identical to 0.1 kcal/mol? This must be discussed and explained. It is a most annoying shortcoming of this otherwise so interesting study.