barbagroup / bempp_exafmm_paper

Manuscript repository for our research paper, including reproducibility packages for all results, and latex source files.
6 stars 2 forks source link

Round 3, Reviewer 3 #20

Open labarba opened 2 years ago

labarba commented 2 years ago

The authors stated that ``The sphere has a radius of 1 Å, and 100 charges are placed randomly inside, representing the atoms in the solute.'' This is unphysical. There cannot be 100 atoms in a sphere of a radius of 1 Å. It is also unphysical to place charges randomly because some charges can be very close to each other and the associated electrostatic potential and force become singular or divergent. The PB model becomes invalid under this situation.

The authors cherry-pick one protein (1RCX) to validate their method. In the PB community, researchers typically use tens of proteins. The resulting convergence claim is not meaningful.

Unfortunately, there is a large difference between the extrapolated solutions of Bempp and `trusted' APBS (i.e., 320 kcal/mol) for the cheery-picked protein 1RCX. Note that the energy of hydrogen bond in the solvent is typically only a few kcal/mol. Protein-protein binding affinity and protein-ligand binding affinity are typically only in the range of 0 to -20 kcal/mol. The huge difference of 320 kcal/mol indicates that Bempp is neither reliable nor meaningful for practical applications. In the literature, it is well known that APBS, MIBPB, DELPHI, and PBSA have similar free energy predictions for over a hundred molecules (Accurate, robust, and reliable calculations of Poisson–Boltzmann binding energies, Journal of Computational Chemistry 38 (13), 941-948, 2017). This deep level of inconsistency with exiting PB solvers may be the real reason for the authors' rejection of my suggestion of carrying out a comparison on the Marcia Fenley and coauthor's 51 complexes (J Chem Theor Comput 9(8):3677–3685, 2013.).

The authors state that MIBPB does not converge on 1A63 because its solvation energies are: -586.50, -585.52, and -587.43 kcal/mol on three grid spacing: 1.0, 0.75, and 0.5 Angstrom. They seem to not understand that in the CHARMM force field, the radius of the hydrogen is less than 1 angstrom, which means some hydrogen atoms were not resolved at grid size 1.0 Angstrom, leading to oscillations. Note that there is no theoretical proof or theorem that indicates numerical solutions must be monotonic to be convergent. The authors should also test their Bempp for under resolved meshes 1RCX, such as N= 118708 and 59354.

The authors concluded that MIBPB is not convergent because its solutions vary from 586.50, -585.52, to -587.43 kcal/mol in their test. Please note that the total amount of energy difference is only about 2 kcal/mol on three meshes. In comparison, Bempp's free energy varied over 400 kcal/mol during the mesh refinement for 1RCX in Table 5, which is 200 times larger (Bempp is also inferior in terms of relative errors.). As mentioned early, the discrepancy of the extrapolated solutions of APBS and Bempp is 320 kcal/mol, rendering unphysical results for computational biophysics. How can anyone believe Bempp is doing anything meaningful?

Marcia Fenley and coauthor's set of 51 complexes (J Chem Theor Comput 9(8):3677–3685, 2013.) is an important benchmark test. Note that MIBPB is not the only method that has been tested for the set. Marcia Fenley and coauthors tested many PB solvers. DELPHI was also tested (see, for example, Accurate estimation of electrostatic binding energy with Poisson-Boltzmann equation solver DelPhi program, Journal of Theoretical and Computational Chemistry 15 (08), 1650071, 2016). If Bempp is as robust as the authors claimed, it only takes a couple of days to finish the recommended test. Why the authors do not just do it but choose to fight over a constructive suggestion? At this point, I am quite convinced that Bempp must behave badly for this benchmark test.

The comparison of time and memory cost with respect to error for APBS and Bempp in Figure 4 is designed to mislead. Two methods do not use the same parallel and GPU setting and cannot be compared for time and memory, not to mention that the reference solutions from the two methods differ by over 320 kcal/mol. If the solution is incorrect or unphysical, it is completely irrelevant how fast a method can generate it.

It is known that the biggest advantage for BEM on PB is they have much less memory usage. Discretization on the surface is O(N^2) while the discretization on the 3D domain is O(N^3). However, as shown in Tables 4 and 5, they have almost the same amount of memory usage compared to APBS. This makes their method less competitive.

The authors claim ``The workflow integrates an easy-to-use Python interface with optimized computational kernels, and can be run interactively via Jupyter notebooks, for faster prototyping''. Unfortunately, there is no independent verification for their claim. The authors have placed a license requirement on their software which prevents anonymous downloading and verification of their claims.

labarba commented 2 years ago

Reviewer comment:

The authors stated that "The sphere has a radius of 1 Å, and 100 charges are placed randomly inside, representing the atoms in the solute." This is unphysical. There cannot be 100 atoms in a sphere of a radius of 1 Å. It is also unphysical to place charges randomly because some charges can be very close to each other and the associated electrostatic potential and force become singular or divergent. The PB model becomes invalid under this situation.

Reply: Although it is true that this was a synthetic test case, the goal of this result is to analyze and compare the computational performance of two formulations (‘direct’ and ‘derivative’). For this purpose, a physical interpretation of the experimental setup is beside the point. The spherical surface with 100 charges inside is mathematically correct, as they are point charges, therefore the PB model in this setup is still valid. Nevertheless, to avoid any confusion or distraction, we repeated the same experiment but using instead a real biomolecule (1A7M), and revised the paper accordingly. The new results (Table 4 and Figure 4) substantiate the same performance claims. The findings are unchanged.

Reviewer comment:

The authors cherry-pick one protein (1RCX) to validate their method.

Reply: First, the section "Performance comparison with APBS" using 1RCX, as its name suggest, is not meant to show validation. It addresses performance.

Reviewer comment:

In the PB community, researchers typically use tens of proteins. The resulting convergence claim is not meaningful.

The result in the main paper (1RCX) is one of a larger test set of molecules, which are included in the Supplementary Material. Perhaps this was not evident to the reviewer. All of the molecules reported in the Supplementary Material show a difference of 2% or less between our implementation and the community software APBS. We addressed this reviewer's concern (in the previous round of review) by adding more proteins for code verification. Table 2 in the supplementary material includes 9 molecules. In the latest revision, Table 3 in the supplementary material contains additional 9 Barnase-Barstar complexes.

Reviewer comment:

Unfortunately, there is a large difference between the extrapolated solutions of Bempp and 'trusted' APBS (i.e., 320 kcal/mol) for the cheery-picked protein 1RCX. Note that the energy of hydrogen bond in the solvent is typically only a few kcal/mol. Protein-protein binding affinity and protein-ligand binding affinity are typically only in the range of 0 to -20 kcal/mol. The huge difference of 320 kcal/mol indicates that Bempp is neither reliable nor meaningful for practical applications. In the literature, it is well known that APBS, MIBPB, DELPHI, and PBSA have similar free energy predictions for over a hundred molecules (Accurate, robust, and reliable calculations of Poisson–Boltzmann binding energies, Journal of Computational Chemistry 38 (13), 941-948, 2017). This deep level of inconsistency with exiting PB solvers may be the real reason for the authors' rejection of my suggestion of carrying out a comparison on the Marcia Fenley and coauthor's 51 complexes (J Chem Theor Comput 9(8):3677–3685, 2013.).

Reply: Although it may seem that 320 kcal/mol is a large difference for practical applications, it is 3% of the total solvation free energy, and traceable to the model difference between finite difference and a boundary element method. Also, the 3% difference is on the extrapolated values, as grid spacing approaches zero. This difference is attributed to the modeling differences between finite difference and boundary element methods, including:

These differences have been discussed in the following publications. Boschitsch, Fenley, Zhou, J. Phys. Chem. B, 2002 Boschitsch, Fenley, J. Comput. Chem., 2004 Geng, Krasny, J. Comput. Phys. 2013

As Boschitsch and co-authors quoted, "In light of these advantages, the BEM, especially when accelerated with the fast methods described below, can provide a computational approach superior to FDM."

Comparing the differences between APBS with MIBPB, Delphi and PBSA to those with Bempp is somewhat unfair, as APBS, MIBPB, Delphi and PBSA are all finite difference codes.

In the PB literature, the results obtained with the finest mesh are usually reported (hardly ever we see extrapolated values as mesh spacing goes to zero). Looking at the finest-mesh result from APBS and Bempp instead, the computed energe is −10803.16 and -10997.16 kcal/mol, respectively. This amounts to a ~1.8% difference, which due to the aforementioned modeling differences is to be expected and comparable with what has been reported in the literature. This is not valid reasoning to assess our solver, in particular. When simulating a physical problem with two different mathematical models, we cannot say which one is closer to the true value. Both are approximations to the true value, under the respective modeling conditions. We replied in the previous round of review to this point, without comment or acknowledgement from the reviewer on our rebuttal.

Reviewer comment:

The authors state that MIBPB does not converge on 1A63 because its solvation energies are: -586.50, -585.52, and -587.43 kcal/mol on three grid spacing: 1.0, 0.75, and 0.5 Angstrom. They seem to not understand that in the CHARMM force field, the radius of the hydrogen is less than 1 angstrom, which means some hydrogen atoms were not resolved at grid size 1.0 Angstrom, leading to oscillations. Note that there is no theoretical proof or theorem that indicates numerical solutions must be monotonic to be convergent. The authors should also test their Bempp for under resolved meshes 1RCX, such as N= 118708 and 59354.

The authors concluded that MIBPB is not convergent because its solutions vary from 586.50, -585.52, to -587.43 kcal/mol in their test. Please note that the total amount of energy difference is only about 2 kcal/mol on three meshes. In comparison, Bempp's free energy varied over 400 kcal/mol during the mesh refinement for 1RCX in Table 5, which is 200 times larger (Bempp is also inferior in terms of relative errors.). As mentioned early, the discrepancy of the extrapolated solutions of APBS and Bempp is 320 kcal/mol, rendering unphysical results for computational biophysics. How can anyone believe Bempp is doing anything meaningful?

Reply: Again (as in previous rounds of rebuttal), we seem to be using the word "converge" with different meanings. Our use of the word "converge" corresponds to the behaviour of numerical schemes where the error decreases at a fixed rate, proportional to a power of the mesh spacing. For example, second-order finite difference converges as $\Delta x^2$, and thus in a log plot one sees the error decreasing with a slope of 2. By definition, if one is in the so-called asymptotic range (achived with sufficiently refined mesh), the error must decay monotonically at the rate that matches the truncation error of the scheme.

Although it is true that the difference in solvation energy between meshes for our tests with MIBPB was small, the difference between the coarse and medium mesh was smaller than the difference between the medium and small mesh, indicating that convergence at an asymptotic rate is not achieved. Moreover, the oscillatory behavior didn’t allow us to compute an extrapolated value with Richardson extrapolation. We did not see this behavior with APBS, regardless of the radius of the hydrogen atom with CHARMM, and the solution was converging with the expected order or convergence, indicating that the mesh was well resolved.

As mentioned previously, the 320 kcal/mol difference is attributable to the model differences when equations are discretized with FD and BEM.

Reviewer comment:

Marcia Fenley and coauthor's set of 51 complexes (J Chem Theor Comput 9(8):3677–3685, 2013.) is an important benchmark test. Note that MIBPB is not the only method that has been tested for the set. Marcia Fenley and coauthors tested many PB solvers. DELPHI was also tested (see, for example, Accurate estimation of electrostatic binding energy with Poisson-Boltzmann equation solver DelPhi program, Journal of Theoretical and Computational Chemistry 15 (08), 1650071, 2016). If Bempp is as robust as the authors claimed, it only takes a couple of days to finish the recommended test. Why the authors do not just do it but choose to fight over a constructive suggestion? At this point, I am quite convinced that Bempp must behave badly for this benchmark test.

Reply: In this revision, we computed binding energies for 9 Barnase-Barstar complexes and demonstrated that our results show agreement with the results from other PB software reported in the literature. See section “Binding energy calculations”.

Reviewer comment:

The comparison of time and memory cost with respect to error for APBS and Bempp in Figure 4 is designed to mislead. Two methods do not use the same parallel and GPU setting and cannot be compared for time and memory, not to mention that the reference solutions from the two methods differ by over 320 kcal/mol. If the solution is incorrect or unphysical, it is completely irrelevant how fast a method can generate it.

Reply: We performed the simulation on the same machine with two Intel Xeon Gold 6148 CPUs. Both codes are multi-threaded and we did not use GPUs in any experiment reported . Raw result files are fully available for inspection in our reproducibility package, at https://github.com/barbagroup/bempp_exafmm_paper/tree/master/repro-pack/runs/1RCX_performance

We clearly stated in the manuscript that "We ran all experiments on a single CPU node of Pegasus, a Linux cluster at the George Washington University. Each node is equipped with two 20-core Intel Xeon Gold 6148 CPUs (base frequency at 2.4 GHz, max turbo frequency at 3.7 GHz) and 192GB RAM." We would politely ask reviewer to provide evidence of us using GPUs.

The "incorrect or unphysical" statement has been addressed in our response above.

Reviewer comment:

It is known that the biggest advantage for BEM on PB is they have much less memory usage. Discretization on the surface is O(N^2) while the discretization on the 3D domain is O(N^3).

Reply: BEM produces small and dense linear systems, while FDM produces large and sparse linear systems. Fast BEM indeed has better scaling (O(N^2)) in both time and space than FDM (O(N^3)), where N denotes number of discretization points for each dimensionality. However, when talking about performance in practice, the asympototic constant also matters. FDM has a smaller constant and poorer scaling; BEM has a larger constant and better scaling. That explains the crossover points in Figure 5. This study reflects such subtlety and gives readers an idea of the regime where each method performs best.

Therefore, BEM on PB will not always have much less memory usage, as pointed out by the reviewer. It depends on the problem size and accuracy.

Reviewer comment:

However, as shown in Tables 4 and 5, they have almost the same amount of memory usage compared to APBS. This makes their method less competitive.

Reply: In the finest case: APBS: 34 GB memory usage, 1.4e-2 error Bempp: 30.1 GB memory usage, 2.2e-3 error

Bempp achieves almost one more digit of accuracy with a smaller memory footprint than APBS. One should always look at performance at the same level of accuracy for a fair comparison.

In sum, it is incorrect compare Tables 4 and 5 only, as there is no way of knowing which mesh densities give equivalent accuracy in Bempp and APBS. This is resolved in Figure 4, right panel, where the memory usage is compared against the error. We believe this is the fairest way to compare the two methods.

Reviewer comment:

The authors claim "The workflow integrates an easy-to-use Python interface with optimized computational kernels, and can be run interactively via Jupyter notebooks, for faster prototyping". Unfortunately, there is no independent verification for their claim. The authors have placed a license requirement on their software which prevents anonymous downloading and verification of their claims.

Reply: The exercise of looking at the eigenvalues is a verification of that claim. The Jupyter notebooks showing how easy it is to do such exploration are available openly on the GitHub repository for this manuscript. Anyone can independently verify our claims, including this reviewer, if interested.

In regards to the license, the concerns of the reviewer are unfounded. We used the most permissive kind of license, allowing any use (even commercial), any derivative works (without restriction). The software is fully open source, using a standard license approved by OSI, the Open Source Initiative, see https://opensource.org/licenses