a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.01k stars 126 forks source link

add_peptide_bonds at atomic graph adds unrealistic peptide bonds #253

Closed universvm closed 1 year ago

universvm commented 1 year ago

Describe the bug As per title, using "add_peptide_bonds" in atomic graphs leads to unrealistic sizes of bonds. Eg:

('A:CYS:0:SG', 'A:ILE:1:N')
{'kind': {'peptide_bond'}, 'distance': 4.136273927099123}
('A:ILE:1:CD1', 'A:VAL:2:N')
{'kind': {'peptide_bond'}, 'distance': 6.095431157842733}
('A:VAL:2:CG2', 'A:ARG:3:N')
{'kind': {'peptide_bond'}, 'distance': 4.734868424782256}
('A:ARG:3:NH2', 'A:ALA:4:N')
{'kind': {'peptide_bond'}, 'distance': 8.366510503190682}
('A:ALA:4:CB', 'A:PRO:5:N')
{'kind': {'peptide_bond'}, 'distance': 3.285273961178885}
('A:PRO:5:CD', 'A:GLY:6:N')
{'kind': {'peptide_bond'}, 'distance': 4.52409703697876}
('A:GLY:6:O', 'A:ARG:7:N')
{'kind': {'peptide_bond'}, 'distance': 2.0699007222569885}
('A:ARG:7:NH2', 'A:ALA:8:N')
{'kind': {'peptide_bond'}, 'distance': 8.57448756486357}
('A:ALA:8:CB', 'A:ASP:9:N')
{'kind': {'peptide_bond'}, 'distance': 3.597926347217241}
('A:ASP:9:OD2', 'A:MET:10:N')
{'kind': {'peptide_bond'}, 'distance': 5.105517897334217}
('A:MET:10:CE', 'A:ARG:11:N')
{'kind': {'peptide_bond'}, 'distance': 7.6633865229414075}
('A:ARG:11:NH2', 'A:PHE:12:N')
{'kind': {'peptide_bond'}, 'distance': 9.416556642425085}

To Reproduce I'm using this config

params_to_change = {
    "granularity": "atom",
    "edge_construction_functions": [
        add_atomic_edges,
        add_bond_order,
        add_peptide_bonds,
    ],
}

Expected behavior I believe it should probably return an error or return peptide bonds between the carboxyl and amino group, though here they don't seem to have much of a logic for the starting atom.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

a-r-j commented 1 year ago

Hey @universvm

Thanks for flagging this. It appears the atomic granularity wasn't considered when this was first implemented. Fortunately, the solution is quite simple:

def add_sequence_distance_edges(
    G: nx.Graph, d: int, name: str = "sequence_edge"
) -> nx.Graph:
    """
    Adds edges based on sequence distance to residues in each chain.

    Eg. if ``d=6`` then we join: nodes ``(1,7), (2,8), (3,9)..``
    based on their sequence number.

    :param G: Networkx protein graph.
    :type G: nx.Graph
    :param d: Sequence separation to add edges on.
    :param name: Name of the edge type. Defaults to ``"sequence_edge"``.
    :type name: str
    :return G: Networkx protein graph with added peptide bonds.
    :rtype: nx.Graph
    """
    # Iterate over every chain
    for chain_id in G.graph["chain_ids"]:

        # Find chain residues
        chain_residues = [
            (n, v) for n, v in G.nodes(data=True) if v["chain_id"] == chain_id
        ]

        # THIS IS THE CHANGE <------------------------------------------------------------
        if G.graph["config"].granularity == "atom" and name == "peptide_bond":
            chain_residues = [
                (n, v)
                for n, v in chain_residues
                if v["atom_type"] in {"N", "C"}
            ]
         # END CHANGE

        # Iterate over every residue in chain
        for i, residue in enumerate(chain_residues):
            try:
                # Checks not at chain terminus - is this versatile enough?
                if i == len(chain_residues) - d:
                    continue
                # Asserts residues are on the same chain
                cond_1 = (
                    residue[1]["chain_id"]
                    == chain_residues[i + d][1]["chain_id"]
                )
                # Asserts residue numbers are adjacent
                cond_2 = (
                    abs(
                        residue[1]["residue_number"]
                        - chain_residues[i + d][1]["residue_number"]
                    )
                    == d
                )

                # If this checks out, we add a peptide bond
                if (cond_1) and (cond_2):
                    # Adds "peptide bond" between current residue and the next
                    if G.has_edge(i, i + d):
                        G.edges[i, i + d]["kind"].add(name)
                    else:
                        G.add_edge(
                            residue[0],
                            chain_residues[i + d][0],
                            kind={name},
                        )
            except IndexError:
                continue
    return G

I flagged the change. I think this should be a robust solution. Would you mind double checking & seeing if you get the expected behaviour? If so, I'll make a PR tonight.

My example is:

import graphein.protein as gp

config = gp.ProteinGraphConfig(
    granularity="atom",
    edge_construction_functions=[
        gp.add_atomic_edges, gp.add_bond_order, gp.add_peptide_bonds
    ]
)

g = gp.construct_graph(
    pdb_code="3eiy",
    config=config
)

for u,v,d in g.edges(data=True):
    if "peptide_bond" in d["kind"]:
        print(u, v, d)

And outputs:

A:SER:2:C A:PHE:3:N {'kind': {'peptide_bond'}, 'bond_length': 1.3357642007480208, 'distance': 1.3357642007480208}
A:PHE:3:C A:SER:4:N {'kind': {'peptide_bond'}, 'bond_length': 1.3344489499415109, 'distance': 1.3344489499415109}
A:SER:4:C A:ASN:5:N {'kind': {'peptide_bond'}, 'bond_length': 1.3296875572855444, 'distance': 1.3296875572855444}
A:ASN:5:C A:VAL:6:N {'kind': {'peptide_bond'}, 'bond_length': 1.3246860005299383, 'distance': 1.3246860005299383}
A:VAL:6:C A:PRO:7:N {'kind': {'peptide_bond'}, 'bond_length': 1.3450029739744072, 'distance': 1.3450029739744072}
A:PRO:7:C A:ALA:8:N {'kind': {'peptide_bond'}, 'bond_length': 1.3230517752529567, 'distance': 1.3230517752529567}
A:ALA:8:C A:GLY:9:N {'kind': {'peptide_bond'}, 'bond_length': 1.3327888054751955, 'distance': 1.3327888054751955}
...
universvm commented 1 year ago

Hey @a-r-j , I'll be working on it tomorrow and let you know, it seems to be good!

universvm commented 1 year ago

Hi @a-r-j ,

Thanks for the fix, it works!

('A:CYS:0:C', 'A:ILE:1:N') {'kind': {'peptide_bond'}, 'bond_length': 1.247632958846471, 'distance': 1.247632958846471}
('A:ILE:1:C', 'A:VAL:2:N') {'kind': {'peptide_bond'}, 'bond_length': 1.2540534278889397, 'distance': 1.2540534278889397}
('A:VAL:2:C', 'A:ARG:3:N') {'kind': {'peptide_bond'}, 'bond_length': 1.2770696926949603, 'distance': 1.2770696926949603}
('A:ARG:3:C', 'A:ALA:4:N') {'kind': {'peptide_bond'}, 'bond_length': 1.2228973791778277, 'distance': 1.2228973791778277}
('A:ALA:4:C', 'A:PRO:5:N') {'kind': {'peptide_bond'}, 'bond_length': 1.248942352552751, 'distance': 1.248942352552751}
('A:PRO:5:C', 'A:GLY:6:N') {'kind': {'peptide_bond'}, 'bond_length': 1.272465716630511, 'distance': 1.272465716630511}
('A:GLY:6:C', 'A:ARG:7:N') {'kind': {'peptide_bond'}, 'bond_length': 1.3055596501117825, 'distance': 1.3055596501117825}
('A:ARG:7:C', 'A:ALA:8:N') {'kind': {'peptide_bond'}, 'bond_length': 1.3029236355212837, 'distance': 1.3029236355212837}
('A:ALA:8:C', 'A:ASP:9:N') {'kind': {'peptide_bond'}, 'bond_length': 1.2903650646231868, 'distance': 1.2903650646231868}
('A:ASP:9:C', 'A:MET:10:N') {'kind': {'peptide_bond'}, 'bond_length': 1.2767478216155297, 'distance': 1.2767478216155297}
('A:MET:10:C', 'A:ARG:11:N') {'kind': {'peptide_bond'}, 'bond_length': 1.2764051081063572, 'distance': 1.2764051081063572}
('A:ARG:11:C', 'A:PHE:12:N') {'kind': {'peptide_bond'}, 'bond_length': 1.2014732622909259, 'distance': 1.2014732622909259}
a-r-j commented 1 year ago

Now resolved in 1.6.0

pip install graphein=1.6.0