MobleyLab / Lomap

Alchemical mutation scoring map
MIT License
37 stars 17 forks source link

Total charge comparison threshold #19

Closed shuail closed 6 years ago

shuail commented 7 years ago

I loose the total charge comparison threshold from 1e-3 to 1e-2 and commit to shuaidev branch. It improves our test cases performance (put molecules which we thought have identical net charge into the same group) while more validation may needed. If this loose threshold cause any problems, we could discuss here or change to a net charge comparison instead of summing all partial charges which we are using in the code now?

davidlmobley commented 7 years ago

For the record, this was implemented by #18

@shuail - can you remind me the units of this threshold? Is this a score threshold, or a charge threshold? That is, is it in units of electrons, or an exponential of some factor?

The main concern I would have here would just be that if you have molecules which have slightly non-integer charge due to rounding, you can potentially get a contribution to the relative binding free energy by allowing the charge to change. Specifically, imagine the charging free energy of an ion is 100 kcal/mol, and you are mutating ligand A, with a charge of 0, into ligand B, with a charge of 1e-2 electrons (due to rounding error). To a first approximation we can assume linear response in water, but the binding site might be much less polar and less responsive. So for a worst case scenario let's assume that a change in charge of 1e-2 makes no difference in the binding site, but it gives a linear difference in water. So the 1e-2 change contributes (1e-2 electrons)*100 kcal/(mol electron) in water but 0 in the binding site, so your total relative binding free energy ends up with a contribution of 1 kcal/mol due to rounding error in your partial charges.

I think originally I chose a rather small threshold to make sure that this error would be very small even in such cases -- basically, I make sure the charges in my mol2 files are VERY precise so that rounding error in the charges remains smaller than 1e-3. I also went so far as to get OpenEye to adjust the precision used for charges in their mol2 files for exactly this reason: A small rounding error on charges for free energy calculations could introduce significant errors.

shuail commented 7 years ago

@davidlmobley The units of this threshold is in the units of electrons. And I have an example here to explain this issue.

Below are the molecules:

Molecule 1:

@MOLECULE 183L.lig 17 18 1 0 0 SMALL bcc

@ATOM 1 C1 27.8600 8.1280 4.2770 C.2 1 LIG -0.117200 2 H1 28.1090 8.9960 4.8630 H 1 LIG 0.144000 3 C2 28.3280 8.0190 2.8980 C.2 1 LIG -0.173200 4 H2 28.9420 8.7960 2.4750 H 1 LIG 0.142000 5 C3 27.8090 6.7540 2.3390 C.3 1 LIG -0.019900 6 H3 27.2120 6.9620 1.4890 H 1 LIG 0.074700 7 H4 28.6200 6.1330 2.0580 H 1 LIG 0.074700 8 C4 26.3310 4.9670 3.4680 C.ar 1 LIG -0.115000 9 H5 26.2880 4.3150 2.6120 H 1 LIG 0.132000 10 C5 25.6900 4.6140 4.6350 C.ar 1 LIG -0.129000 11 H6 25.1140 3.7050 4.6790 H 1 LIG 0.130000 12 C6 25.7550 5.3850 5.7660 C.ar 1 LIG -0.133000 13 H7 25.2690 5.0580 6.6700 H 1 LIG 0.130000 14 C7 26.4400 6.5710 5.7400 C.ar 1 LIG -0.107000 15 H8 26.4610 7.2000 6.6140 H 1 LIG 0.134000 16 C8 27.0950 6.9300 4.5690 C.ar 1 LIG -0.071800 17 C9 27.0070 6.1430 3.4330 C.ar 1 LIG -0.094300 @BOND 1 2 1 1 2 3 1 2 3 4 3 1 4 5 3 1 5 6 5 1 6 7 5 1 7 9 8 1 8 10 8 ar 9 11 10 1 10 12 10 ar 11 13 12 1 12 14 12 ar 13 15 14 1 14 16 1 1 15 16 14 ar 16 17 5 1 17 17 8 ar 18 17 16 ar @SUBSTRUCTURE 1 LIG 1 TEMP 0 0 ROOT

Molecule 2

@MOLECULE 182L.lig 15 16 1 0 0 SMALL bcc

@ATOM 1 O1 27.6840 6.1560 2.3190 O.3 1 LIG -0.204200 2 C1 28.4310 7.3150 2.4070 C.2 1 LIG -0.014900 3 H1 29.0400 7.6270 1.5760 H 1 LIG 0.189000 4 C2 28.2590 7.9040 3.6080 C.2 1 LIG -0.177200 5 H2 28.7250 8.8530 3.8140 H 1 LIG 0.165000 6 C3 27.4200 7.0120 4.3480 C.ar 1 LIG -0.106800 7 C4 26.9090 7.0380 5.6140 C.ar 1 LIG -0.083000 8 H3 27.1770 7.8350 6.2870 H 1 LIG 0.138000 9 C5 26.0640 6.0480 6.0060 C.ar 1 LIG -0.149000 10 H4 25.6550 6.0520 7.0020 H 1 LIG 0.133000 11 C6 25.7060 5.0210 5.1570 C.ar 1 LIG -0.113000 12 H5 25.0280 4.2550 5.4940 H 1 LIG 0.135000 13 C7 26.1910 4.9500 3.8930 C.ar 1 LIG -0.121000 14 H6 25.8830 4.1490 3.2420 H 1 LIG 0.150000 15 C8 27.0710 5.9320 3.5060 C.ar 1 LIG 0.058100 @BOND 1 2 1 1 2 3 2 1 3 4 2 2 4 5 4 1 5 6 4 1 6 7 6 ar 7 8 7 1 8 9 7 ar 9 10 9 1 10 11 9 ar 11 12 11 1 12 13 11 ar 13 14 13 1 14 15 1 1 15 15 6 ar 16 15 13 ar @SUBSTRUCTURE 1 LIG 1 TEMP 0 0 ROOT

Yes, I think the charge similarity is affected by the precision of the partial charges. For molecule 1, the total charge (summing of the partial charges) is 0.001 and for molecule 2, the total charge is -0.001. So the difference of the total charge is 0.002. If the threshold is 1-e3, then these two molecules have different charges and if the threshold is 1-e2, they will have the same total charges. While in this case, I think they all have zero total charges so that's why I choose to use the 1-e2 threshold.

davidlmobley commented 7 years ago

Yes, I think the charge similarity is affected by the precision of the partial charges. For molecule 1, the total charge (summing of the partial charges) is 0.001 and for molecule 2, the total charge is -0.001. So the difference of the total charge is 0.002. If the threshold is 1-e3, then these two molecules have different charges and if the threshold is 1-e2, they will have the same total charges. While in this case, I think they all have zero total charges so that's why I choose to use the 1-e2 threshold.

@shuail - yes, this is exactly the issue I was describing. Note the charges on your atoms are all rounded to just three significant digits of precision or less (e.g. -0.117200 or 0.083000 or 0.058100) though far more digits of precision are available. This naturally means you'll accumulate significant rounding error as you sum the charges, which: a) results in the behavior you noted where the net charge may not quite be integer so you need to adjust the threshold for the mapper b) ALSO can cause the behavior I described where the same issue could introduce errors in computed free energy differences

I'd suggest that you instead add additional precision for your charges.

There is actually utility functionality in ParmEd that can do this for you -- it provides a "fix_charges" function which can make your charges sum to an integer. :) See https://github.com/ParmEd/ParmEd/issues/268 and https://github.com/ParmEd/ParmEd/pull/271.

So, I want to emphasize: a better fix is NOT to adjust the threshold for charge similarity, but to make sure your charges sum to an integer. The former is just papering over the problem and could introduce error; the latter avoids the threshold problem AND the error.

shuail commented 7 years ago

@davidlmobley I see the point now, yes, the sum of partial charges is not zero in these molecules. I think for this dataset, I just download the mol2 files instead of generating from smile strings myself. So it probably has bad precision. I agreed that the total charge should be integer for input mol2 files for running the simulations. While for the original code with 1-e3 threshold, when I generate the graph, it's hard to distinguish cases with different total charge in 1 electron unit or in 0.002 electron unit. So should we modify the code to report a Warning of the ligands with non-integer total charge?

davidlmobley commented 7 years ago

@shuail :

I think for this dataset, I just download the mol2 files instead of generating from smile strings myself. So it probably has bad precision.

It's not just that -- it's fairly common to have mol2 files which have exactly this issue (partly because we don't think we KNOW partial charges that precisely, so we usually round to not very much precision). Normally we think this is unimportant, but this is one case where it IS important for the reasons I described.

The best solution is just to make sure that you process molecules before feeding them in to ensure that they have charges very close to integer.

I agreed that the total charge should be integer for input mol2 files for running the simulations

Note they won't necessarily be quite integer because of rounding. The issue is how close to integer you are -- you need to be "close enough", which is what this tolerance is about.

I guess what I'm saying is that there are two issues: 1) You should pre-process your mol2 files to ensure you have basically integer charges to a fairly high level of precision (which the code I linked to earlier can do for you) 2) You should set this threshold sufficiently small (probably 1e-3) that it will never introduce significant error

So should we modify the code to report a Warning of the ligands with non-integer total charge?

Probably, yes. You probably want it to tell you that you should be pre-processing your ligands to ensure the charges are within tolerance of integer. :)

shuail commented 7 years ago

Sounds reasonable to me. I will modify the code back into 1-e3 threshold for detecting different total charges and add Warnings if the individual total charge is not close enough to integer by 1-e3 threshold.

davidlmobley commented 6 years ago

I think this was resolved by #28 ? Were any others?

shuail commented 6 years ago

@davidlmobley this issue is not resolved by #28 , create a new pull request #29 to address this.

davidlmobley commented 6 years ago

Resolved by #29 .