Closed davidkastner closed 6 months ago
Hi @davidkastner, good catch. This is a slightly tricky issue to resolve.
I think the omission of the water nodes comes from here: https://github.com/a-r-j/graphein/blob/6dae5ff114a40410566f6fea4e558b2b9a6ba580/graphein/protein/graphs.py#L199
Where we select on CA atoms to count as nodes. I think if you use granularity="atom"
the waters will be present.
For heteroatoms it can be tricky to consistently and universally define what the coarsened node should be. I think a good heuristic could be the CoM for the ligand for coarsened graphs. One work-around would be to write your own hetatm_df_processing_func
to manipulate the hetatm df to contain a representative "CA"
We looked into this quite extensively for Protein-Ligands graphs (see #164 , mainly here: https://github.com/a-r-j/graphein/blob/d81fc2f77b3562f61f70f257ddf509d5102b8bf6/graphein/protein_ligand/graphs.py).
What's your application? Using graphein.protein.tensor.data.Protein
should work reliably if it's ML-based.
Hi @a-r-j. I see the problem and agree it would be challenging to generalize! For my purposes, the atom representation will work well as I am building graphs for QM cluster models extracted from proteins. As the QM cluster models are small in size, the extra information afforded by the atom representation will be useful. I appreciate your response and will close the issue as resolved but hopefully it will be a useful point of reference for others.
Describe the bug The config parameter
keep_hets
is currently not working. It seemskeep_hets
was recently updated from bool type to a list of strings, where it contains the specific residue name of a HETATM residue such askeep_hets=["HOH"]
. However, after updating the parameter, it doesn't include the specified residues in the graph. The tutorial installed the newest version of Graphein-1.7.6 and I haven't had a chance to back test the other versions to see when thekeep_het
functionality broke but will updated this ticket when I have a chance.To Reproduce This can be seen in the tutorial example of
3EIY
, which contains 112 waters. However, when we run:None of the waters are included in the graph. If we print the nodes with
g.nodes()
and look that the last residues we see that no waters were included:Expected behavior If I understand correctly, the expected behavior of
keep_hets
would be for the waters to now be included in the graph representation.Screenshots Here is a screen shot of the representation of
3EIY
, where we can see only the protein residues included.Desktop (please complete the following information): This reproduced using the google Collab notebook with graphein-1.7.6 installed. No other modification where made to the tutorial.