ReliaSolve / Molprobity2

0 stars 0 forks source link

Add additional testing found in Probe beyond what is in Reduce for probe score determination #106

Closed russell-taylor closed 3 years ago

russell-taylor commented 3 years ago

Probe returns a 2D array of dotNode results from examineDots() rather than a set of sums of different types of interaction scores:

typedef struct dotNode_t {
    struct dotNode_t *next; /* link to next dot */
    atom *a;       /* dot's owner  */
    atom *t;       /* dot's cause  */
    point3d loc;   /* dot position */
    point3d spike; /* end of spike */
    int type;      /* -1 bump, 0 touch, +1 H bond */
    float gap;     /* vdw-vdw distance */
    char ptmaster; /* point master, kinemage output, (M for mc) dcr041009*/
    int dotCount;  /*added by SJ -10/07/2011 for keeping count of number of dots in the interaction, used only in writeRaw and Condense functions*/
    float angle;   /*dcr20120120 angle: dot's: parent, self, cause, esp for H */
} dotNode;

The first index of the array is for each atom type (NUMATOMTYPES, from atomprops.h:atomIdentifiers). Warning: The atom type gets replaced by the nucleic acid base type (see dotType() unless called by updateHydrogenInfo() when ByNABaseColor is true).

The second index is one node for each contact type (NODEWIDTH, see next comment).

These are determined in examineDots() after the closest contact atom and distance are determined, so can remain in the Python code rather than the C++ code.

This array is traversed by the enumerate() routine to compute and sum up the interaction scores from the individual dots (these are the calculations that are done inside the score_dots method in the cctbx/Probe code).

It is also used to list all of the dots in other output formats.

These are determined in examineDots() after the closest contact atom and distance are determined, so can remain in the Python code rather than the C++ code.

russell-taylor commented 3 years ago

Probe deals with seven different contact types (weak H-bond and worse overlap are only broken out when LweakHBonds and LworseOverlaps are true; they are false by default. The type is determined by dotClassIndex(type, minGap):

It produces a list of results of each type for every node, with NODEWIDTH being used to indicate how many contact types there are. There is one node for each atom type (NUMATOMTYPES).

This is distinct from the overlap type (-1 bump, 0 touch, 1 hydrogen bond).

These are determined in examineDots() after the closest contact atom and distance are determined, so can remain in the Python code rather than the C++ code.

russell-taylor commented 3 years ago

The default behavior in Probe is to ignore mainchain-mainchain comparisons for the same residue. This is filtered within the examineDots() method, it avoids comparisons between atoms that are both marked as mainchain and that both come from the same residue (and neither is HET). This might be handled by adding these to the excluded list for each atom when that atom is on the mainchain, moving the logic into the outer loop in the CCTBX Probe.

Reduce only compares hydrogens and other moving side-chain atoms, so does not need to test for mainchain-mainchain collisions.

CCTBX seems to handle main chain determination by doing a selection based on atom names: ['C', 'O', 'CA', 'N', 'CB'], are the main-chain atoms, but select.c in Probe also seems to mark hydrogens (or deuteriums) bonded to the main chain.

These can be determined in genDotIntersect() as part of setting up the list to pass to examineDots(), so can remain in the Python code rather than the C++ code. It can simply not put these neighboring atoms into the list of those to be examined.

Dorothee says that there used to be a "smart selection" to ask for mainchain atoms. There have been some changes recently, so this may have gone away. She will check.

russell-taylor commented 3 years ago

Unless the DoWatWat flag is set to be true, the default behavior of Probe is to ignore comparisons between atoms when both of them are water. This is filtered within the examineDots() method. This might be handled by adding these to the excluded list for each atom when the source atom is a water, moving the logic into the outer loop in the CCTBX Probe.

These can be determined in genDotIntersect() as part of setting up the list to pass to examineDots(), so can remain in the Python code rather than the C++ code. It can simply not put these neighboring atoms into the list of those to be examined.

There is a selection for asking whether an atom is a water. We can use that to either get a list of them or a boolean vector.

russell-taylor commented 3 years ago

Probe uses a targFlg target flag to indicate which atoms should be targets for collisions. It flags all of the ones that could be targets before calling examineDots(). It is set in selectSource() based on the target pattern.

CCTBX Probe could handle this either by only putting these atoms into the spatial-query structure or by adding all non-target atoms to the excluded list. Better to do the selection and only put potential target atoms into the spatial query.

Probe already uses a mark flag to indicate bonded atoms, which we will handle by putting into the non-bonded list.

This can be handled in the Python code.

russell-taylor commented 3 years ago

Probe adjusts its spike lengths by hydrogen distances when there is hydrogen overlap the same way Reduce does, but it also does them for the individual dots that it returns. Because we ended up breaking the code into an inner per-dot check this will naturally flow back to the Probe Python code.

russell-taylor commented 3 years ago

Probe keeps track of dots by atom element (type) and by contact/overlap type (see above), with overlap type stored inside the dotNode and atom types maintained on different lists.

These are determined in examineDots() after the closest contact atom and distance are determined, so can remain in the Python code rather than the C++ code.

russell-taylor commented 3 years ago

Probe has switches with comments indicating that it needs to test the acceptance angle of hydrogen bonds "in some cases" (when dealing with hydrogen bonds to aromatic rings?), but there is no actual code within the if/else statement to actually do this. The TEST_ACCEPT_ANGLE_PROP flag is used to do that, and it is set in the select.c code both in tables and in some functions.

Reduce deals with aromatic rings in generateWaterPhantomHs but not with acceptance angles during probe score determination. It also checks for aromatic flags in genHydrogens() to decide whether to insert aromatic methyls.

Reduce has a compile flag AROMATICS_ACCEPT_HBONDS that is set, and it has a MAXAromdih to set the maximum dihedral angle that is used inside isAromMethyl() to determine whether "the hydrogen atom in methyl group stemmed from aromatic ring".

Reduce also deals with aromatics in FlipMemo::Rename_Flip().

Another issue has been created to implement the feature described in the comments.

russell-taylor commented 3 years ago

Probe has a CHOHBfactor that is used to scale the visible spike length for CH..O bonds. This scaling is on top of the spikelen scaling. It happens when there is a CH_DONOR_PROP flag set on at least one side of an interaction for particular hydrogen bonds. The default for this factor is 0.5 but it can be set on the command line.

CH_DONOR_PROP is set on HD2, HE1, DD2, DE1 on HIS; on H2 and H8 of ": A:ADE:ATP:ADP:AMP:1MA:RIA:T6A: I: DA: DI: AR:" and on H8 of ": G:GUA:GTP:GDP:GMP:GSP:OMG:1MG:2MG:M2G:7MG: YG: DG: GR:".

However, this kind of Hydrogen is only treated as a hydrogen bond if there is a special argument set on the command line; otherwise, we never get to this case. By default, these hydrogens are not donors. The PermitCHXHB flag must be set using the -DOCHO option to have these considered at all. Only when they are considered do we have the possibility of handling them, and then they are scaled down by the specified factor.

A separate issue has been created for this.

russell-taylor commented 3 years ago

There are static arrays that keep track of counts of different types of interactions (mcmc, scsc, mcsc, other) along with a sum count that is separately constructed in countsummary(). mc = main chain, sc = side chain.

There are two "banks" of each of these, one for the current pass [bank 0] and another that can have the values of the current pass copied into it that is used to sum both banks when the Pass parameter for countsummary is 2 (it is set to 2 when not doing rawOutput and when OutputFmtType is kinemage or "oneline multi:count", which reports that it is intersecting both ways).

Each bank has an entry for each contact type and a final entry that is the sum of all of that interaction.

These values are incremented inside examineDots() and examineOneDotEach() and are used within writeOutput() and countsummary().

These are determined in examineDots() after the closest contact atom and distance are determined, so can remain in the Python code rather than the C++ code.

These will need to become an additional return value from one function that is passed into another. Because there are multiple calls to the functions that increment them, they will need to be passed by reference or else re-passed in and back out each time.

russell-taylor commented 3 years ago

(Skip below for non-H-bond interactions with water phantoms)

ptmaster is set based on interaction type (see line 3999+ of probe.c) and the appropriate static array count is incremented, then saveDot() is called with a modified spike location.

Note: examineOneDotEach() does additional work to determine parameters for saveDot().

These are determined in examineDots() after the closest contact atom and distance are determined, so can remain in the Python code rather than the C++ code.

russell-taylor commented 3 years ago

We were able to break the code into a first part before idx = dotClassIndex() and make that common to both Probe and Reduce, returning a vector of dot results that is the post-processed differently by the Probe code and Reduce code.

Note that examineOneDotEach() rebuilds a single dot per target not from the dot list and runs on it (sometimes dumping the dot location to standard error when Ldotdump is true -- can we ignore -DOTDUMP?). We may need to have a different loop for this type and both it and the other call a different per-dot core function. This function was intended to be a faster, approximate way to run. It needs to have code to smoothly ramp the dot reaction from completely on to completely off as a hydrogen bound to either the source or target dot comes between two heavy atoms that are being tested against each other. The print statements caused by -DOTDUMP are there to help debug the function.

russell-taylor commented 3 years ago

Probe has an adjustable dot point set for C=O Carbonyl atoms, which supports a command-line settable CORadScale, which converts it from 1.75 to 1.65 angstrom radius by default -- can we ignore -COSCALE?

There is a comment in Reduce from 7/3/98 JMW saying that the C=O radius changes from 1.75 to 1.65 but there is no corresponding 1.65 in the code, and the table entry for C=O specifies 1.70. It was changed in a later commit.

In any case, this will be handled when filling in the VDW radius structures, so will be in the Python code.

russell-taylor commented 3 years ago

examineDots() is called with a list of target atoms that is produced in genDotIntersect(), which may be more like the new CCTBX Probe code that searches neighbor lists; it calls findTouchingAtoms() to determine this and then sets markBonds() to indicate bonded atoms. It could set excluded atoms and such this way instead...

This is the only function that calls either examineDots() or examineOneDotEach(), so is indeed the root of the calling tree; it and the Reduce calls can call a common routine to handle checking a curated list of exact contact atoms.

findTouchingAtoms() uses a spatial hierarchy to do the query, which is what is done early on in the Reduce approach.

There are two sets of atoms involved in these calculations along with a hierarchy for each: (allMainAtoms/abins) and (allMovingAtoms/bbins). Comments state that these must be disjoint sets with no atoms in common (or allMovingAtoms empty). allMainAtoms is filled in by processCommandLine(). allMovingAtoms is empty except when called within movingDoCommand() which is called by autoBondRot(). When called from mainProbeProc(), it is called with empty allMovingAtoms. The auto-bond-rot command-line arguments are not being supported in the initial implementation, so this code path will not be needed for now.

russell-taylor commented 3 years ago

Probe examineDots() has a spikelen parameter that is used to scale the spikes compared to the minGap. It defaults to 0.5 but can be set on the command line. It is used to determine the overlaps; in Reduce this is always 0.5.

It looks like this scaling can be applied in examinDots() based on the returned overlap value; divide by 0.5 and multiply by spikelen to make it match what it would have been had it been multiplied internally.

Note: The signs of the overlaps may always be negative in Probe whereas they are positive or zero in Reduce.

russell-taylor commented 3 years ago

@todo Pull out the issues described above that have to do with Probe porting but not low-level testing into separate issues and then close this issue.

russell-taylor commented 3 years ago

@Joymaker Tagging you on this issue because it has a lot of notes that will be helpful when trying to port Probe to Python/CCTBX.