Hi, I was trying "Subgraphing to Protein Surface", and encountered the following error - rsa feature was missing from some nodes attributes.
Reproduce error
# basic
from functools import partial
from pathlib import Path
# graphein
from graphein.protein.graphs import construct_graph
from graphein.protein.features.nodes import rsa
from graphein.protein.edges import distance as D
from graphein.protein.config import ProteinGraphConfig, DSSPConfig
from graphein.protein.subgraphs import extract_surface_subgraph
# ---------- graph config ----------
params_to_change = {
"granularity": "centroids", # "atom", "CA", "centroids"
"insertions": True,
"edge_construction_functions": [
# graphein.protein.edges.distance.add_peptide_bonds,
D.add_distance_to_edges,
D.add_hydrogen_bond_interactions,
D.add_ionic_interactions,
D.add_backbone_carbonyl_carbonyl_interactions,
D.add_salt_bridges,
# distance
partial(D.add_distance_threshold, long_interaction_threshold=4, threshold=4.5),
],
'dssp_config': DSSPConfig(executable="/usr/bin/mkdssp"),
'graph_metadata_functions': [rsa],
}
config = ProteinGraphConfig(**params_to_change)
# ---------- input struct ----------
pdb_path = Path('input_pdb_cryst1.pdb')
g = construct_graph(config=config, path=pdb_path, verbose=False)
# ---------- surface subgraph ----------
RSA_THRESHOLD = 0.2
s_g = extract_surface_subgraph(g, RSA_THRESHOLD)
leads to the following error
ProteinGraphConfigurationError: RSA not defined for all nodes (H:TYR:52:A). Please ensure you have used graphein.protein.nodes.features.dssp.rsa as a graph annotation function.
Because I set insertions to True in config, my nodes ID contains insertion codes. However, when you add node_id column at dssp.py#L139C1-L145C6 which did not consider insertions, which later causes add_dssp_features at line 211dict(dssp_df[feature]) in which H:TYR:100:A, H:TYR:100:B, etc. are overwritten by the same node_id key H:TYR:100
Hi, I was trying "Subgraphing to Protein Surface", and encountered the following error - rsa feature was missing from some nodes attributes.
Reproduce error
leads to the following error
Because I set
insertions
to True in config, my nodes ID contains insertion codes. However, when you addnode_id
column at dssp.py#L139C1-L145C6 which did not consider insertions, which later causes add_dssp_features at line 211dict(dssp_df[feature])
in whichH:TYR:100:A
,H:TYR:100:B
, etc. are overwritten by the same node_id keyH:TYR:100
So adding the following lines (adapted from label_node_id) right after dssp.py#L139C1-L145C6 fixed it