Electrostatics / pdb2pqr

PDB2PQR - determining titration states, adding missing atoms, and assigning charges/radii to biomolecules.
http://www.poissonboltzmann.org/
Other
124 stars 34 forks source link

Error in the construction of a network of hydrogen bonds #200

Closed boristim66 closed 3 years ago

boristim66 commented 3 years ago

The construction of a network of hydrogen bonds does not work for atoms of the same residues with the same numbers, but belonging to different chains due sort error : 3hto_chainsOB.zip $ pdb2pqr30 --ff AMBER --with-ph 4.5 --drop-water --keep-chain "3hto_chainsOB.pdb" "3hto_chainsOB.pqr" ............ DEBUG: Optimizeable to optimizeable DEBUG: Water to Water DEBUG:Starting network ASN B 595 (Flip), ASN D 595 (Flip), ASN F 595 (Flip) DEBUG: Optimizeable to backbone Traceback (most recent call last): File "/usr/local/bin/pdb2pqr30", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/pdb2pqr/main.py", line 796, in main main_driver(args) File "/usr/local/lib/python3.7/dist-packages/pdb2pqr/main.py", line 765, in main_driver is_cif=is_cif, File "/usr/local/lib/python3.7/dist-packages/pdb2pqr/main.py", line 641, in non_trivial hydrogen_routines.optimize_hydrogens() File "/usr/local/lib/python3.7/dist-packages/pdb2pqr/hydrogens/init.py", line 485, in optimize_hydrogens hbondlist = util.sort_dict_by_value(hbondmap) File "/usr/local/lib/python3.7/dist-packages/pdb2pqr/utilities.py", line 33, in sort_dict_by_value items.sort() TypeError: '<' not supported between instances of 'PotentialBond' and 'PotentialBond'

sobolevnrm commented 3 years ago

Thank you for letting us know about this problem; we will investigate.

intendo commented 3 years ago

@Eo300 and I worked on this today. The underlying problem is in the sorting of the dictionary where the keys are PotentialBond (class object), and the values are hydrogen bond distances (floating point).

The utilities.py:sort_dict_by_value function tries to sort the dictionary by the hydrogen bond distances and return a list of PotentialBond objects in descending order.

The "problem" is that there are multiple entries in the dictionary that have the exact same floating point values so the sort tries to sort the PotentialBonds and there is no __lt__ (dunder) function for the class.

We changed the sort_dict_by_value by adding a try/except around the sort like the following:

    # items.sort()
    try:
        items.sort()
    except TypeError as err:
        print(f"ERROR: {err}:")
        for k, v in items:
            print(f"    Key = {k}, Value = {str(v)}")

Printing out the values when there is a conflict (e.g., the distances in the dictionary are identical) gives the output:

INFO:Optimizing hydrogen bonds
ERROR: '<' not supported between instances of 'PotentialBond' and 'PotentialBond':
    Key = 3.321664629338058, Value = ND2 ASN B 595 to O VAL B 591 (3.32 A)
    Key = 4.017931557406128, Value = ND2FLIP ASN B 595 to O VAL B 591 (4.02 A)
    Key = 3.321664629338058, Value = ND2 ASN D 595 to O VAL D 591 (3.32 A)
    Key = 4.017931557406129, Value = ND2FLIP ASN D 595 to O VAL D 591 (4.02 A)
    Key = 3.321664629338058, Value = ND2 ASN F 595 to O VAL F 591 (3.32 A)
    Key = 4.017931557406128, Value = ND2FLIP ASN F 595 to O VAL F 591 (4.02 A)
ERROR: '<' not supported between instances of 'PotentialBond' and 'PotentialBond':
    Key = 3.2402754512541003, Value = OD1FLIP ASN B 595 to ND2FLIP ASN D 595 (3.24 A)
    Key = 3.4139168860271383, Value = OD1 ASN B 595 to ND2 ASN F 595 (3.41 A)
    Key = 3.468977483576864, Value = OD1 ASN B 595 to ND2FLIP ASN D 595 (3.47 A)
    Key = 3.6002421921432504, Value = OD1 ASN B 595 to ND2FLIP ASN F 595 (3.60 A)
    Key = 3.4139168860271383, Value = ND2 ASN B 595 to OD1 ASN D 595 (3.41 A)
    Key = 3.6002421921432504, Value = ND2FLIP ASN B 595 to OD1 ASN D 595 (3.60 A)
    Key = 3.468977483576864, Value = ND2FLIP ASN B 595 to OD1 ASN F 595 (3.47 A)
    Key = 3.2402754512541003, Value = ND2FLIP ASN B 595 to OD1FLIP ASN F 595 (3.24 A)
    Key = 3.468977483576864, Value = OD1 ASN D 595 to ND2FLIP ASN F 595 (3.47 A)
    Key = 3.2402754512541003, Value = OD1FLIP ASN D 595 to ND2FLIP ASN F 595 (3.24 A)
    Key = 3.413916886027138, Value = ND2 ASN D 595 to OD1 ASN F 595 (3.41 A)
    Key = 3.6002421921432504, Value = ND2FLIP ASN D 595 to OD1 ASN F 595 (3.60 A)

Neither @Eo300 or I know if this is a problem in the data (e.g. duplicated values or a circular list of values in the hydrogen bonds).

If the problem is not in the data, the options below will resolve the failure:

NOTE: In either case, the items will result with duplicate values in the returned list. For example, using the EASY option, the first ERROR above would return the following list of PotentialBond items:

    Item = ND2FLIP ASN D 595 to O VAL D 591 (4.02 A)
    Item = ND2FLIP ASN B 595 to O VAL B 591 (4.02 A)
    Item = ND2FLIP ASN F 595 to O VAL F 591 (4.02 A)
    Item = ND2 ASN B 595 to O VAL B 591 (3.32 A)
    Item = ND2 ASN D 595 to O VAL D 591 (3.32 A)
    Item = ND2 ASN F 595 to O VAL F 591 (3.32 A)

If the return value is acceptable, I would vote that we implement the EASY options. If not, we would need to know how to sort the PotentialBond classes to implement the __lt__ function.

@sobolevnrm, which option do you think we should implement or do you have another option based on these findings?

boristim66 commented 3 years ago

In this model, the biological unit (trimer of influenza hemagglutinin) is mathematically assembled from three asymmetric units, so there are complete matches of distances. I understand that to solve my problem, it is enough to add a jitter to the coordinates of atoms. However, there remains a very small possibility that, when used intensively in the future, identical distances between atoms may occur. I very hope you can easy solve this problem

sobolevnrm commented 3 years ago

The EASY solution from @intendo sounds reasonable to me. @boristim66 did the output above look OK to you?

boristim66 commented 3 years ago

As a pdb2pqr user, of course, it will suit me. As a programmer, I would ask if a unique sort key is later used to search the array.

intendo commented 3 years ago

@boristim66, how would you want the following sorted:

    Item = ND2FLIP ASN D 595 to O VAL D 591 (4.02 A)
    Item = ND2FLIP ASN B 595 to O VAL B 591 (4.02 A)
    Item = ND2FLIP ASN F 595 to O VAL F 591 (4.02 A)
    Item = ND2 ASN B 595 to O VAL B 591 (3.32 A)
    Item = ND2 ASN D 595 to O VAL D 591 (3.32 A)
    Item = ND2 ASN F 595 to O VAL F 591 (3.32 A)

We would probably need to sort by the first atom.name and then the first atom.residue followed by the second atom.name and then the second atom.residue. If so, that shouldn't be too difficult to implement.

boristim66 commented 3 years ago

It's look nice.

sobolevnrm commented 3 years ago

This problem also occurs for 7d1t.

intendo commented 3 years ago

We were able to fix the original problem by fixing the code that does the sorting. We ran into another problem with the actual 3hto_chainsOB.pdb file. The residue sequence numbers had values of 132A in columns 24-27 on lines 1003-1009. According to the PDB format for ATOM, those values have to be integer.

@boristim66 , you will need to resolve the formatting issue and then try again.

boristim66 commented 3 years ago

According PDB specificatinon, see your ATOM reference, ... 23 - 26 Integer resSeq Residue sequence number. 27 AChar iCode Code for insertion of residues. ... Position 27 is frequently used for symbols, idicating volatile regions of AA, where insertions occured to preserve common numeration of protein. At least for flu(influenza) proteins, this is widely used.

PS. As an example, look, please, 4fqk where are multiple regions of insertions presents. Unfortunately, developers often forget about this pdb feature.

intendo commented 3 years ago

@boristim66 , I apologize, you are correct. I fixed the original sort problem and introduced another problem that was fixed and merged into the main branch today. I was able to use the source code that is currently on the main branch on your file and get an output pqr file.

Please download the source from main and let us know if this version resolves the problem.

boristim66 commented 3 years ago

@intendo, All works fine, thank you very much. I will analyze the results.