Closed universvm closed 1 year ago
Hey @universvm
Thanks for flagging this. It appears the atomic granularity wasn't considered when this was first implemented. Fortunately, the solution is quite simple:
def add_sequence_distance_edges(
G: nx.Graph, d: int, name: str = "sequence_edge"
) -> nx.Graph:
"""
Adds edges based on sequence distance to residues in each chain.
Eg. if ``d=6`` then we join: nodes ``(1,7), (2,8), (3,9)..``
based on their sequence number.
:param G: Networkx protein graph.
:type G: nx.Graph
:param d: Sequence separation to add edges on.
:param name: Name of the edge type. Defaults to ``"sequence_edge"``.
:type name: str
:return G: Networkx protein graph with added peptide bonds.
:rtype: nx.Graph
"""
# Iterate over every chain
for chain_id in G.graph["chain_ids"]:
# Find chain residues
chain_residues = [
(n, v) for n, v in G.nodes(data=True) if v["chain_id"] == chain_id
]
# THIS IS THE CHANGE <------------------------------------------------------------
if G.graph["config"].granularity == "atom" and name == "peptide_bond":
chain_residues = [
(n, v)
for n, v in chain_residues
if v["atom_type"] in {"N", "C"}
]
# END CHANGE
# Iterate over every residue in chain
for i, residue in enumerate(chain_residues):
try:
# Checks not at chain terminus - is this versatile enough?
if i == len(chain_residues) - d:
continue
# Asserts residues are on the same chain
cond_1 = (
residue[1]["chain_id"]
== chain_residues[i + d][1]["chain_id"]
)
# Asserts residue numbers are adjacent
cond_2 = (
abs(
residue[1]["residue_number"]
- chain_residues[i + d][1]["residue_number"]
)
== d
)
# If this checks out, we add a peptide bond
if (cond_1) and (cond_2):
# Adds "peptide bond" between current residue and the next
if G.has_edge(i, i + d):
G.edges[i, i + d]["kind"].add(name)
else:
G.add_edge(
residue[0],
chain_residues[i + d][0],
kind={name},
)
except IndexError:
continue
return G
I flagged the change. I think this should be a robust solution. Would you mind double checking & seeing if you get the expected behaviour? If so, I'll make a PR tonight.
My example is:
import graphein.protein as gp
config = gp.ProteinGraphConfig(
granularity="atom",
edge_construction_functions=[
gp.add_atomic_edges, gp.add_bond_order, gp.add_peptide_bonds
]
)
g = gp.construct_graph(
pdb_code="3eiy",
config=config
)
for u,v,d in g.edges(data=True):
if "peptide_bond" in d["kind"]:
print(u, v, d)
And outputs:
A:SER:2:C A:PHE:3:N {'kind': {'peptide_bond'}, 'bond_length': 1.3357642007480208, 'distance': 1.3357642007480208}
A:PHE:3:C A:SER:4:N {'kind': {'peptide_bond'}, 'bond_length': 1.3344489499415109, 'distance': 1.3344489499415109}
A:SER:4:C A:ASN:5:N {'kind': {'peptide_bond'}, 'bond_length': 1.3296875572855444, 'distance': 1.3296875572855444}
A:ASN:5:C A:VAL:6:N {'kind': {'peptide_bond'}, 'bond_length': 1.3246860005299383, 'distance': 1.3246860005299383}
A:VAL:6:C A:PRO:7:N {'kind': {'peptide_bond'}, 'bond_length': 1.3450029739744072, 'distance': 1.3450029739744072}
A:PRO:7:C A:ALA:8:N {'kind': {'peptide_bond'}, 'bond_length': 1.3230517752529567, 'distance': 1.3230517752529567}
A:ALA:8:C A:GLY:9:N {'kind': {'peptide_bond'}, 'bond_length': 1.3327888054751955, 'distance': 1.3327888054751955}
...
Hey @a-r-j , I'll be working on it tomorrow and let you know, it seems to be good!
Hi @a-r-j ,
Thanks for the fix, it works!
('A:CYS:0:C', 'A:ILE:1:N') {'kind': {'peptide_bond'}, 'bond_length': 1.247632958846471, 'distance': 1.247632958846471}
('A:ILE:1:C', 'A:VAL:2:N') {'kind': {'peptide_bond'}, 'bond_length': 1.2540534278889397, 'distance': 1.2540534278889397}
('A:VAL:2:C', 'A:ARG:3:N') {'kind': {'peptide_bond'}, 'bond_length': 1.2770696926949603, 'distance': 1.2770696926949603}
('A:ARG:3:C', 'A:ALA:4:N') {'kind': {'peptide_bond'}, 'bond_length': 1.2228973791778277, 'distance': 1.2228973791778277}
('A:ALA:4:C', 'A:PRO:5:N') {'kind': {'peptide_bond'}, 'bond_length': 1.248942352552751, 'distance': 1.248942352552751}
('A:PRO:5:C', 'A:GLY:6:N') {'kind': {'peptide_bond'}, 'bond_length': 1.272465716630511, 'distance': 1.272465716630511}
('A:GLY:6:C', 'A:ARG:7:N') {'kind': {'peptide_bond'}, 'bond_length': 1.3055596501117825, 'distance': 1.3055596501117825}
('A:ARG:7:C', 'A:ALA:8:N') {'kind': {'peptide_bond'}, 'bond_length': 1.3029236355212837, 'distance': 1.3029236355212837}
('A:ALA:8:C', 'A:ASP:9:N') {'kind': {'peptide_bond'}, 'bond_length': 1.2903650646231868, 'distance': 1.2903650646231868}
('A:ASP:9:C', 'A:MET:10:N') {'kind': {'peptide_bond'}, 'bond_length': 1.2767478216155297, 'distance': 1.2767478216155297}
('A:MET:10:C', 'A:ARG:11:N') {'kind': {'peptide_bond'}, 'bond_length': 1.2764051081063572, 'distance': 1.2764051081063572}
('A:ARG:11:C', 'A:PHE:12:N') {'kind': {'peptide_bond'}, 'bond_length': 1.2014732622909259, 'distance': 1.2014732622909259}
Now resolved in 1.6.0
pip install graphein=1.6.0
Describe the bug As per title, using "add_peptide_bonds" in atomic graphs leads to unrealistic sizes of bonds. Eg:
To Reproduce I'm using this config
Expected behavior I believe it should probably return an error or return peptide bonds between the carboxyl and amino group, though here they don't seem to have much of a logic for the starting atom.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context Add any other context about the problem here.