ReliaSolve / Molprobity2

0 stars 0 forks source link

Check Reduce/Probe heavy-atom radii and donor/acceptor against CCTBX #119

Open russell-taylor opened 3 years ago

russell-taylor commented 3 years ago

It is possible that CCTBX has not yet implemented the different radii for the different cases.

Also check the donor/acceptor flags to see if they match. Reduce sometimes has both flags set for the same molecule...

Do this by adding the table from Reduce along with the atom-mapping functions into CCTBX/Probe and producing a function that can make the ExtraAtomInfo structure from these tables. Then compare the filled-in ExtraAtomInfo tables from the CCTBX case with those from Probe/Reduce and report differences.

This will provide a path to a client using whichever approach is desired, perhaps avoiding the need for some of the slow loading in CCTBX.

russell-taylor commented 3 years ago

(done) Be sure to handle the cases in fixupHeavyAtomElementType() when determining the appropriate types for atoms. (done) The carbon checks use a table that has two kinds of entries in it, one of which tells whether a residue is an amino acid (pull out into a separate set... the iotbx.pdb module contains a sub-module amino_acid_codes with one_letter_given_three_letter which we can check to see if we're in) and one of which checks the " C" atom in an amino acid and the ISACOFLAG on specific atoms in specific amino acids. Go ahead and pull these tables into the Python code so we can find the correct Carbon atom type.

@todo: The Nitrogen acceptor checks (not VDW radius) require a HET database in Reduce; we'd presumably like to avoid pulling that in... but we basically just need to know how many bonds it has, which we can get from the bond proxies. However, we'd also not have the bond proxies in the case where we're falling back to Probe/Reduce atom classification. Maybe we check the neighborhood around the Nitrogen looking for atoms within bonding distance and count them up that way.

@todo: Check reduce.cpp around line 849 that sets ambiguous N-terminal/fragment nitrogens as acceptors.

Probe handles Car in setAromaticProp(), called by setProperties(). It handles C=O in isCarbonylAtom() based on atom and residue name. It presumably handles Nacc (Nitrogen with the ACCEPTOR flag set) using its acceptor/donor code.

@todo: Also need to handle Har and Hpol and Ha+p, which Reduce handles in the tables in StdRes.H.cpp.

Probe handles Har in setAromaticProp() by checking whether the H atom is tagged as aromatic in a table. It handles Hpol in updateHydrogenInfo(). Ha+p is the same as Har in the Reduce table, so doesn't need to be distinguished in Probe -- both are the same size and marked as donors. Probe traverses neighboring covalent bond lists to determine the status of Hydrogens.

russell-taylor commented 3 years ago

(done) Asked whether the warning about atom type Co is a typo. If so, fix it to not warn. It is a typo; fix it not to warn.

russell-taylor commented 3 years ago

(done) Check the tables in Python against the Probe ones; this will ensure that Probe and Reduce match and that we've got the right values in these tables.

russell-taylor commented 3 years ago

(done) Rename the _legalResiduesForSpecialHydrogen to something that respects the fact that these are not hydrogens.

russell-taylor commented 3 years ago

(done) Add special-case check for shifted SE atom that is found in Probe code.

russell-taylor commented 3 years ago

(see below) Check Neutron distances.

russell-taylor commented 3 years ago

Added command-line option to ask for Neutron distances. There are a ton of differences, see why.

russell-taylor commented 3 years ago

The AtomTypes.py was reading implicit hydrogen distances instead of neutron distances due to a parsing error. Fixed and now the two have the same number of atom mismatches in Neutron and in electron-cloud distances.

russell-taylor commented 3 years ago

@todo Check hydrogen radii: Both electron-cloud and Neutron distances. Do this by running on a model file that has had hydrogens added.

Note: When we run phenix.reduce on 3dfr.pdb and then try and compare on the output file, we get an error interpreting the model:

Sorry: Fatal problems interpreting model file:
  Number of atoms with unknown nonbonded energy type symbols: 1
    Please edit the model file to resolve the problems and/or supply a
    CIF file with matching restraint definitions, along with
    apply_cif_modification and apply_cif_link parameter definitions
    if necessary.

However, after running mmtbx.hydrogenate the model can still be parsed.

Problems: (1) With dfr3.pdb: in electron-cloud mode, Reduce has a lot of Hydrogens with radius 1.22 that CCYBX has with radius 1.05. (2) In neutron mode, Reduce has a lot with 1.17 that CCTBX has with radii 1.05 or 1.22. Solutions: (1) Need to mark the hydrogens as being the proper type (aromatic and/or polar). (2) We have to tell the ener_lib to use neutron distances.

russell-taylor commented 3 years ago

@todo: See if Probe does additional processing on these atoms to change their status after the initial estimation. Check both radius and ACCEPTOR/DONOR.

russell-taylor commented 3 years ago

(done) Probe special-cases '?A??' atom names in :ASX:GLX:ASN:GLN:, replacing them with ' O ' before calling identifyAtom during the loading process. Reduce has a fixupAmbigAtomName() function called during loading that does a more complicated check to turn some of them into O or N. Used the Reduce approach.

If CCTBX is not doing this, then our table-based approach will get different answers for the models we read using it.

russell-taylor commented 3 years ago

@todo: Figure out how to enable proper comparisons between MolProbity approach and CCTBX approach:

russell-taylor commented 3 years ago

(resolved) Talk with Dorothee to see if we're not setting the right flags to have it report smaller VDW radii for Hydrogens in the neutron case. Otherwise, start the process of figuring out how to change them. Solution: Indeed, ener_lib needed to be told to use neutron distances.

russell-taylor commented 3 years ago

@todo: What files to check for further verification?

russell-taylor commented 3 years ago

The PDB has a Phenix installation, so we can make the system robust for them early on

russell-taylor commented 3 years ago

(done) Handle OXT, which is at the the C terminal of the protein chain (happens in 4z4d and others). Nigel says that when the lookup fails when we can find it by looking it up in the geometry restraints manager. It is in the geostd data. If I can't get this to work, send him a short reproduction code and we can figure it out.

6/11/2021: Asked Nigel about how to get the info from the restraints manager, sending him a reproduction script.

7/13/2021: Nigel added an approach to determining the VDW radii and modified my example script to use the new way. Merged master into native-probe branch and gave it a try. It did give responses for OXT, but the radii it gave differed from the original CCTBX approach and also gave different radii even for the same atom types (HOH O atoms, for example). Waiting for a new version.

New version got reasonable radii; will test against Probe results.

russell-taylor commented 3 years ago

See if METALIC_ATOM has some sort of different radius treatment:

See where the 1.4 comes from in CCTBX for Fe and what the label for that entry is to help determine this: It comes from the energy_lib_atom type, whose entries are ion_radius = 0.68, vdw_radius = 1.4, vdwh_radius = None, vdw_radius_neutron = None.

The radius values do not appear to be flipped in this table -- manageMetals() looks for covRad() to determine N-metal bonding, so is not treating them differently. However, their behavior is odd...

See if Reduce/Probe ignores them entirely...

(Making this into a separate issue) https://github.com/ReliaSolve/Molprobity2/issues/152

russell-taylor commented 3 years ago

(done) Also verify that the covalent radii are the same. Problem: ener_lib does not record the covalent radii for atoms, so we'd need to get them from yet another place inside CCTBX if we need them. We don't need them for Probe calculations but will need to find a way to infer them to look for bonding of flip Nitrogens to metals -- we'll presumably need to add a fixed offset to the ion_radius for any metals that are nearby the way the Probe/Reduce tables do.

Flip checking being done otherhow.

russell-taylor commented 3 years ago

(done) Consider how to handle radii for metallic atoms. CCTBX reports both VDW and ionic radii. The tables inside Probe and CCTBX flip these; the ionic radius is listed as the VDW and the VDW as the "Covalent" (this radius doesn't exist for ions). Heavily comment these swaps and decide how to properly fill things in from the CCTBX. Consider correcting the radius names and perhaps using a different one for ions.

Nigel made a new ionic-radius function. It returns results that are similar to, but not identical with, probe/reduce values. He and the Richardsons are discussing how to choose one over the other. All of the radii within the CCTBX Probe/Reduce have been modified to use the ionic radius, per discussions with Michael Prisant.

russell-taylor commented 3 years ago

@todo: New plan: do not try and fall back from CCTBX markings for Probe when we cannot load a file, simply fail to load that file.

@todo: Given that, add an optional command-line argument to original Reduce to dump the atom information along with radii and flags for each atom when it has completed processing (we can control the processing with the exiting command-line arguments). Add a similar argument (or just code) to new Reduce/Optimize or whatever to dump the same information in the same format. Then we can use scripts to compare these two dump files.

@todo: Once that is working as expected, we will either deprecate or remove the methods in AtomTypes.py, moving the Helpers.py:IsMetallic() table information into a more standard CCTBX place.

russell-taylor commented 3 years ago

The table of radii and donor/acceptor values is found in modules/chem_data/mon_lib/ener_lib.cif on the Phenix install.

russell-taylor commented 2 years ago

@todo: In 1bti, the HOH 411 and 419 have B factors larger than 40. This makes them be marked as 0 radius and not acceptors in the original. This makes it show up as different in the new code; perhaps we should check for invalid flags in the new one and do similar things to make comparison easier. In any case, we should verify that these are being skipped in the new code.

russell-taylor commented 2 years ago

Ran Probe vs. Probe2 regression testing on a number of files. Most of them produce the same results for all radii. At least one is still different. Some of the @todo above have been fixed along the way.