a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.03k stars 131 forks source link

C_beta_vector: altlocs #148

Closed avivko closed 2 years ago

avivko commented 2 years ago

Describe the bug Running fig = add_vector_to_plot(g, fig, "c_beta_vector") For some PDBs (i.e. 6rew, 7w9w) yields an error message because apparently some vectors are only 2D:

IndexError                                Traceback (most recent call last)
Input In [31], in <cell line: 10>()
      8 fig = plotly_protein_structure_graph(g, node_size_multiplier=1)
      9 fig = add_vector_to_plot(g, fig, "sidechain_vector", colour="red", scale=1.5)
---> 10 fig = add_vector_to_plot(g, fig, "c_beta_vector", colour="blue", scale=1.5)
     11 fig = add_vector_to_plot(g, fig, "sequence_neighbour_vector_n_to_c", colour="green", scale=1.5)
     12 fig

File /glusterfs/dfs-gfs-dist/kormanav/miniconda3/envs/graphein-gpu/lib/python3.8/site-packages/graphein/protein/visualisation.py:447, in add_vector_to_plot(g, fig, vector, scale, colour, width)
    440     x_edges.extend(
    441         [d["coords"][0], d["coords"][0] + d[vector][0] * scale, None]
    442     )
    443     y_edges.extend(
    444         [d["coords"][1], d["coords"][1] + d[vector][1] * scale, None]
    445     )
    446     z_edges.extend(
--> 447         [d["coords"][2], d["coords"][2] + d[vector][2] * scale, None]
    448     )
    449     edge_text.extend([None, f"{vector}", None])
    451 edge_trace = go.Scatter3d(
    452     x=x_edges,
    453     y=y_edges,
   (...)
    458     hoverinfo="text",
    459 )

IndexError: index 2 is out of bounds for axis 0 with size 2

​ ​

To Reproduce Steps to reproduce the behavior:

from graphein.protein.config import ProteinGraphConfig
from graphein.protein.graphs import construct_graph
from graphein.protein.features.nodes.geometry import add_sidechain_vector, add_beta_carbon_vector, add_sequence_neighbour_vector

config = ProteinGraphConfig()
g = construct_graph(config=config, pdb_code="6rew")

add_sidechain_vector(g) #  works
add_beta_carbon_vector(g) # works
add_sequence_neighbour_vector(g) # works

fig = plotly_protein_structure_graph(g, node_size_multiplier=1)
fig = add_vector_to_plot(g, fig, "sidechain_vector", colour="red", scale=1.5)  # works
fig = add_vector_to_plot(g, fig, "c_beta_vector", colour="blue", scale=1.5)  # does not work
fig = add_vector_to_plot(g, fig, "sequence_neighbour_vector_n_to_c", colour="green", scale=1.5) # works
fig

This yields the error message above.

Expected behavior All PDBs and amino acids should be vectorizable (in 3D) and all vectors plottable. I would expect all vectors to be 3D, with missing dimensions just having 0 in their corresponding index, e.g. (2, 3) --> (2,3,0)

Desktop (please complete the following information):

a-r-j commented 2 years ago

Hi @avivko thanks for raising this issue. It looks like the vector computation functions aren't accounting for possible altlocs and therefore compute a vector for each possible CB positions (they're still 3D vectors but there are two of them). I'll try to figure out an elegant solution for this. In the meantime, you can use something like pdbtools to clean up the structure and remove undesired altlocs.

avivko commented 2 years ago

Hi @a-r-j. Good job figuring that out so quickly! Yeah, I assumed that cleaning the PDBs would solve the problem, but wanted to let you know about this bug so that you'd be able to find the problem and come up with a general solution. And just for general knowledge: How are altlocs generally handled in Graphein? Do you double up the nodes and edges?

a-r-j commented 2 years ago

We try to remove them in the graph construction. The trouble was that the cbeta function relied on the raw_pdb_df which we keep as metadata for traceability. I forgot we also compute a clean rgroup dataframe in construction so it makes much more sense to use that instead.

I just opened a PR and added some tests for your examples. Will get it merged in once tests pass :)

a-r-j commented 2 years ago

Now fixed in 1.3.0 :)