a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.03k stars 131 forks source link

Contacts #21

Closed lerachel closed 3 years ago

lerachel commented 4 years ago

Hi Arian,

I'm just wondering if there's a way to create protein graph fromdgl_graph_from_pdb_code() without needing the contact file generated from getContacts?

I have tried multiple ways to install getContacts and run it. However, I couldn't install vmd-python and run its module. vmd-python's Conda installation was successfully but I kept having "module name "vmd" not found" problem. And installing from source using vmd-python's github repo yielded problem "RuntimeError: Could not find include file 'netcdf.h' in standard include directories. Update $INCLUDE to include the directory containing this file, or make sure it is present on your system".

If Graphein depends on getContacts and VMD-Python to generate an all-atom graph, anything wrong with those two dependencies will cause problem. I also have difficulty install tk version 8.5 because it's no longer available in anaconda package.

If you know a better way to install and run getContacts (and vmd-python), please let me know! Also, if you know how to create a protein graph without contact file, please also let me know. Thank you Arian!

a-r-j commented 4 years ago

Hi Rachel,

dgl_graph_from_pdb_code() has an argument edge_construction, which takes a list {‘contacts’, ‘distance’, ‘custom’, 'delaunay'}. The default is ['contacts']. If you change this to ['distance'] you can make graphs based on distance cutoffs or K NN clustering of distances (specified using the k_nn arg). There is also an argument custom_edges that takes a pandas DF of edges if you wish to compute and supply them yourself (you could do both and pass ['distance', 'custom'] to edge_construction). There are other arguments to pass when instantiating the ProteinGraph class such as edge_distance_cutoff (specifies threshold in Angstroms to create an edge between two residues) and long_interaction_threshold which further filters this by specifying the minimum distance in the sequence positions of the residues (e.g. we may not always want to connect adjacent residues even if they are close in space). You can also use ['delaunay'] to build a graph based on the Delaunay triangulation.

I appreciate the docs are a bit scant on this, so do let me know if you have any questions or run into any problems.

Re VMD, I think you may need to run brew install netcdf pyqt

tk shouldn't be much of an issue, actually. I recently set up a new dev environment on my machine with no issues without installing it.

Atom-level graph construction depends on RDKit and DGL so shouldn't be affected. We're doing a rewrite of Graphein at the moment and we'll be making GetContacts an optional dependency so this should make things easier.

a-r-j commented 4 years ago

Hi Rachel,

Just wondering if you’re getting on okay & if I can close this issue.