Closed diamondspark closed 2 years ago
Yep, you're right @diamondspark . There was a bug in how inserted residues are removed from the dataframe. I've pushed a fix to a pending PR. Need to write some tests but hope to get this merged in soon!
There should actually be 238 nodes in the graph as we remove the inserted residues.
Thank you for looking into this. This seems to only partially work. I have following follow up concerns
g = construct_graph(config=config, pdb_code='1c5y')
gives following error now
~/anaconda3/envs/bar/lib/python3.8/site-packages/graphein/protein/edges/distance.py in add_ionic_interactions(G, rgroup_df)
230 condition1 = (
231 G.nodes[r1]["residue_name"] in POS_AA
232 and G.nodes[r2]["residue_name"] in NEG_AA
233 )
234 KeyError: 'residue_name'
configs = {
"granularity": "CA",
"keep_hets": False,
"insertions": False,
"verbose": False,
"dssp_config": DSSPConfig(),
"pdb_dir":'/groups/cherkasvgrp/Student_backup/mkpandey/My_Projects/Drug_Protein_Interaction_Project1/ER_AR_project/data/PDBBind/pdbbind_v2016_refined/refined-set/',
"pdb_dir":'./data/prot/PDB/',
"node_metadata_functions": [meiler_embedding,expasy_protein_scale],
"edge_construction_functions": [add_peptide_bonds,
add_hydrogen_bond_interactions,
add_ionic_interactions,
add_aromatic_sulphur_interactions,
add_hydrophobic_interactions,
add_cation_pi_interactions]
}
config = ProteinGraphConfig(**configs) format_convertor = GraphFormatConvertor('nx', 'pyg', verbose = 'all_info', columns = ['edge_index','meiler','coords','expasy','node_id','name','dist_mat','num_nodes']) g = construct_graph(config=config, pdb_code='6OGE') protdata = format_convertor(g)
yields different shape for meiler and node features (1483 and 1487)
Data(edge_index=[2, 2472], node_id=[1487], coords=[1], meiler=[1483], expasy=[1483], name=[1], dist_mat=[1], num_nodes=1487)
@a-r-j Can you please look into this again? Thank you!
Hi, @diamondspark. I'm working on this. It's a tricky problem resulting from insertions and alt_locs in the PDB files. These aren't always consinstently represented in the file so it's hard to come up with a robust way that catches all the corner cases.
While I'm figuring it out you can try pre-processing the PDBs with the excellent PDB Tools.
Hi @diamondspark I believe this is resolved in 1.1.1
- try it out & do let me know if not!
Hi @a-r-j
I've been observing that some times the number of nodes generated by the library differs to the coordinate data generated by it by exactly 1 node. Do you know why this happens. Following is an example code
protdata.num_nodes == 256 ; protdata.coords[0].shape==257
Shouldn't these 2 be the same? What am I missing? Thank you!