New models to test Reduce2 on

(done) 1ehz

Original reduce crashes on my laptop with a subscript out of range error in debug mode. In analyzeRes() under else if (ProcessConnHydOnHets) in isAromMethyl() in findRingBondedToMethyl() on line 109 in !visited[x2c->order()]. _order is 42 and visited has 42 elements, so it is one past the end. This is in CTab.cpp.

It looks like the CTab::parseResidue() method is using 1-indexed values rather than 0-indexed values rather than 0-indexed values when calling put(). Adding one more to the vector length avoids the crash. The values are inserted and read back using the same indexing, so there is not a problem with looking up the correct index. Adding one to the vector length will solve the problem cleanly, along with a comment about why it is needed. Fixed in version 4.12

(done) The two hydrogens on n4 c a 60 are not placed in Reduce but they are in Reduce2. They are placed on other c's, for example n4 c a 48. Jane thinks that they probably should be placed.

The overall score is improved in Reduce2 compared to Reduce. Most of the Movers had better scores in Reduce2. Several had slightly worse scores. Some of the exceptions:

AromaticMethylRotator 0 'a' A YYG 37 (listed twice): -2.21 out of 10.425. There are two rotators placed on Carbons in this molecule, which is why we see it listed twice; they are very close in both and pointed away from contacts. There are a number of non-rotatable hydrogens that are pointed differently in the two cases, which is surely what is causing the lower score.
SingleHydrogenRotator 0 'a' A G 18: -3.367 out of 51.588. The Reduce and Reduce2 directions for this hydrogen are both very similar and pointing away from contacts. There are a number of non-rotatable hydrogens that are pointed differently in the two cases, which is surely what is causing the lower score.

(removed)) 1evv

mmtbx.reduce2 D:/data/Richardsons/1evv.pdb output.description_file=1evv.txt output.file_name=1evvnewFH.pdb output.overwrite=True add_flip_movers=True alt_id=a comparison_file=D:/data/Richardsons/1evvFH.pdb

(done) The above command failed when doing comparisons. It did not fail when not doing comparisons. The failure was:

Sorry: Fatal problems interpreting model file:
  Number of atoms with unknown nonbonded energy type symbols: 4
    Please edit the model file to resolve the problems and/or supply a
    CIF file with matching restraint definitions, along with
    apply_cif_modification and apply_cif_link parameter definitions
    if necessary.

Looks like some of the Hydrogens attached to the SPM residue in the Reduce output file were not known to the system. This is presumably due to atom-name updates in the dictionaries between the two versions.

(done) 1u45

Sorry: Fatal problems interpreting model file:
  Number of atoms with unknown nonbonded energy type symbols: 23
    Please edit the model file to resolve the problems and/or supply a
    CIF file with matching restraint definitions, along with
    apply_cif_modification and apply_cif_link parameter definitions
    if necessary.

There is a residue of type '8OG' that has 23 atoms, and hydrogenate reports that hydrogens were not placed there because no restraints were found. There is not an entry for this in geostd. There is an entry with this name in C:\tmp\cctbx_phenix\modules\chem_data\chemical_components\8\data_8OG.cif that has atoms with the same names as this residue. Apparently, these files contain information but not restraints and "eLBOW can create restraints from these files using the --chemical_component option".

Running phenix.elbow --chemical_component 8OG produced an 8OG.cif file that I added to the hydrogenate command line, which claimed to use it for restraints but still could not find the restraints when I ran it. @todo: Nigel can reproduce and is looking into this.

(done) Note that https://phenix-online.org/documentation/reference/refinement.html describes how to add CIF modifications and links using apply_cif_modification and apply_cif_link to get custom ones; are these needed when we're using the ones from the chemical_component library? It looks like maybe these instructions are only for refinement. The mmtbx/monomer_library/pdb_interpretation.py script is the one that prints the message.

This file in chemical_component is not a restraints file. The bond loop has no ideal bond length. Nigel generates restraints on the fly when he needs them. 8OG is RNA, which is difficult. The solution is to run phenix.eLBOW --chem=8OG, which will look it up and create restraints. Nigel says that we should not do this inside of Reduce because it will slow things down. Nigel asserts that people trying to run Reduce should have restraints because they can't otherwise do things with their model. Jane points out that if they only want to do validation, then they need an approach; what to do about things from the PBD? We used to use the HET dict. This dictionary had to be updated. Jane: People running Molprobity will want to be able to validate a file from the PDB. Dorothee: we could also add a command-line option to let it pass when a ligand does not have hydrogens. Jane recommends making a place where we store restraints for all of the HET already in the PDB. Nigel has moved around 3/4 of the HETs in there already, once he has verified them. Things change often in the CIF files and he has to babysit this and watch out for bugs in eLBOW and so forth. He's been putting them into geostd. He doesn't want to put in bad restraints. Jane points out that it is even worse to leave the hydrogens out entirely. Dorothee points out that the PDB was asking people to submit the restraints, and it does not yet come out when you ask for the data.

(done) For now, we'll make it possible to run even with these errors using a command-line argument and MolProbity++ can parse the output and let the users know to add a CIF. Be sure that the output files list the residue names where there are problems.

(done) 145d

Sorry: Fatal problems interpreting model file:
  Number of atoms with unknown nonbonded energy type symbols: 148
    Please edit the model file to resolve the problems and/or supply a
    CIF file with matching restraint definitions, along with
    apply_cif_modification and apply_cif_link parameter definitions
    if necessary.

There are a number of residues that Hydrogenate could not place hydrogens on because no restraints were found: MCY (x3) and 5CM (x5).

(done) NH3: 1ubq Met 1 to Val 17 bb CO & 2 waters.

When testing against just Val 77 without waters, there is a slight difference in rotation between the two that the internal comparison code thinks is an improvement for Reduce2.

The waters are 110 and 98. When they are added, there is still a slight angle difference and the score is still higher for the Reduce2 placement. It is possible that we're seeing an improvement from the 15 degree coarse sampling rather than 30 degree?

(done) probe2 scores for reduce2 are lower than for the reduce solution in the new_reduce_regression code. They are very close (12.4 vs. 12.3) with reduce2 having 3 more H bonds (98 vs. 95). The score drop is in the O close_contact (3.2 vs. 3.1) with the same number of contacts listed for both. The angle is only epsilon different in these runs.

(done) Why do we get a larger difference in angle when we run reduce vs. reduce2 by hand on the laptop vs. when it is run in new_reduce_regression? Added as a separate issue: https://github.com/ReliaSolve/cctbx_project/issues/248.

(done) Add this as a regression test case.

(done) OH: 1yk4 Tyr 13 to Thr 28 bb CO.

(done) Reduce2 now seems to be operating as if it were using the probe radius rather than the dot radius to do its calculations. Nope, it is using the new approach. Why does it get a different angle with the new reduce calculation but the same with the old one? Switching the original Reduce to a finer coarse step (5 vs. 10) did not change the angle by much, but did by a little.

The new_reduce_regression script shows the score for the original reduce solution being a little better than the Reduce2 solution, and the comparison calculation within Reduce2 also shows it to be a bit better (0.2). Changing to a finer coarse step in Reduce2 (2 or 5 rather than 10) did not change things.

(done) Setting preference_magnitude=0 made Reduce2 come closer to the result from Reduce. Setting it to -1.0 made it match exactly. It looks like the preference magnitude is different between Reduce and Reduce2? But the penalty magnitude on RotDoner seems to be 0.0 (RotDonor.h) and when we print in that function it does show up, and the resulting penalties for coarse orientations are all zero, and setting -PEN0.0 does not change the orientation. Running -PEN-1.0 on Reduce does not change the behavior, which is consistent with there being no preference in the RotDonor class that is handling the rotations.

The penalty values in Reduce are multiplied by the negative pmag for Rot3Fold and FlipMemo. The penalty is then added to the score, which will make it smaller.

Reduce2 calls them preferenceEnergies and they are added to the score, making it better when they align.

Running only on Tyrosine with both Reduce and Reduce2 puts the hydrogen in plane; running with preference_magnitude=-1.0 puts it perpendicular in Reduce2; so the preferences seem to be the same between the two cases. Running Reduce2 with preference_magnitude=0 keeps in close to parallel, 3 degrees off from Reduce, so things are behaving about the same in the two cases.

(done) Running on the two-residue snip with -PEN0.0 and preference_magnitude=0 gets different angles for Reduce and Reduce2. The comparison in Reduce2 thinks that the score for Reduce is better than the score it got. It thinks that the scores are much closer when preference_magnitude=-1.0 (and it gets nearly the same answer). @todo: We're probably adding the preference in somewhere along the way when we should not, and the -1 is reverting that behavior. When we set the preference function to None, we get all zeroes for preference energies (as expected) but still get a slightly different angle (with worse score) than Reduce. (The angle for preference_magnitude=0 is the same as for a None preference function.) When we set preference_magnitude to -2 or -4 we get about the same behavior as -1, so it is not that -1 is exactly reverting an unexpected +1 addition. Setting to -0.5 gets slightly closer to the Reduce case (-1 is overshoot). Setting to -0.25 or -0.4 is not enough to pull it close.

(done) We surely don't expect to have Reduce2 pick a worse orientation on Probe2 score with preference_magnitude=0 and with no preference function. It must either not be sampling all angles or must be using different score calculation. It is indeed adding checks at every ten degrees from -180 through +170, so it appears to be checking all of them. Reduce2 Optimizers only score the atom-to-other contact, not the other-to-atom, so we should be using 'once' rather than 'both' Probe2 scoring for comparison. When we do this, we get similar scores for Reduce2 and Reduce solutions. Reduce is using one-way comparisons to optimize as well, so this still doesn't explain why there is a difference. Reduce2 thinks that our score is very slightly worse than the other so it may be a symmetric answer due to slightly different angle samples. Indeed, it appears to be a symmetric rotation on the other side of the atom. Actually, the symmetry is with a hydrogen atom that is overlapped, but the improved score is because of contact with a remote O that is marked as a clash with the OH in the Reduce orientation.

Maybe single-hydrogen rotators prefer to be perpendicular to the ring. The Tyr sample on PDB does... nope, Jane reports that they should be parallel and that is what both Reduce and Reduce2 do.

(done) Consider whether we want to add preference energies in the comparisons, or at least report what they are.

(done) Rerun comparison test cases with new code that includes preference angles handled all the way through and no preference energies for single-hydrogen rotators. We're getting many more Better results when this and https://github.com/ReliaSolve/cctbx_project/issues/251 are added.

ReliaSolve / cctbx_project

New models to test Reduce2 on #245