a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.03k stars 131 forks source link

Can't plot adjacency matrix of subgraph #177

Open kamurani opened 2 years ago

kamurani commented 2 years ago

Hi there,

Just getting started with Graphein and so far it's awesome, the docs are very helpful too -- my issue is with trying to plot a subgraph from a protein graph g.

Not sure if I'm doing something wrong, but I cannot plot the subgraph s_g using the adjacency matrix. I get the following error from plotly:

ValueError: The length of the y vector must match the length of the first dimension of the img matrix.

It seems to be the fact that my subgraph has 7 nodes, but the adjacency matrix still has all 760 nodes from the original protein.

Not sure why this is the case since I am using the recompute_distmat option?

My code:

k = 3
s_g = extract_k_hop_subgraph(g, central_node=phos_site, k=k, recompute_distmat=True)

print(s_g) # 7 nodes 
print(s_g.graph['dist_mat']) # all original 760 nodes?

fig = plot_distance_matrix(s_g)  # this line gives the error
fig.write_html("visual.html")

Thanks in advance, and apologies if i'm missing something.

a-r-j commented 2 years ago

Hi @cimranm will check this out for you on the weekend. Could you true using filter_dataframe=True and maybe update_coords=True too?

a-r-j commented 2 years ago

Figured it out, typo here: https://github.com/a-r-j/graphein/blob/0a2d5e39787cf002c06b03615d9dd3fe62e0171d/graphein/protein/subgraphs.py#L70

should be "dist_mat", not "distmat". Have pushed a fix to master.

kamurani commented 2 years ago

Hey @a-r-j , yep I figured the same thing but was waiting to hear back from you -- I actually modified visualisation.py to use g.graph["distmat"] as I wasn't sure what behaviour you wanted (i.e. the original adjacency matrix should still be accessible from a subgraph?)

Anyway, thanks for the fix :)

kamurani commented 2 years ago

@a-r-j another issue is when the subgraph is plotted, the node labels are assigned to the x and y axes randomly (but the actual distance matrix cols / rows are plotted in sequence order). At the moment i've fixed this using the following:

# in graphein.protein.visualisation.plot_distance_matrix

x_range = list(g.nodes)
x_range = sorted(x_range, key=lambda x: int(x.split(':')[-1]))

Hope that makes sense!

a-r-j commented 2 years ago

Good spot. Seems like the start of a solution, but I think your solution breaks down with multichain proteins, no?

kamurani commented 2 years ago

@a-r-j yep you're right, shall I give it a go and submit a pull request?

On another note, do you think it would be appropriate to allow other options for ordering the axes..? At the moment i'm trying to visualise a protein motif with the adjacency matrix and was thinking of ordering by distance from the central node (i.e. central node AA is the first; then AAs are in ascending order of euclidean distance). Would it make sense to have a feature within the plot functions that changes the ordering?

a-r-j commented 2 years ago

Yep, go ahead with a PR. Happy to review & support :)

That's an interesting use case you've described. I think it would be a good addition. The plot functions accept a distance matrix (separately from a graph) which I anticipated could be used for this sort of thing. However, if this is something that would be really useful for you it's probably useful for others and so we should support it.

a-r-j commented 2 years ago

Hiya,

I’m away at the moment but I’ll check this out for you in next few days.

In the meantime could you true using filter_dataframe=True and maybe update_coords=True too?

Sent from mobile

On 5 Jun 2022, at 15:55, Cam @.***> wrote:

 Hi there,

Just getting started with Graphein and so far it's awesome, the docs are very helpful too -- my issue is with trying to plot a subgraph from a protein graph g.

Not sure if I'm doing something wrong, but I cannot plot the subgraph s_g using the adjacency matrix. I get the following error from plotly:

ValueError: The length of the y vector must match the length of the first dimension of the img matrix. It seems to be the fact that my subgraph has 7 nodes, but the adjacency matrix still has all 760 nodes from the original protein.

Not sure why this is the case since I am using the recompute_distmat option?

My code:

k = 3 s_g = extract_k_hop_subgraph(g, central_node=phos_site, k=k, recompute_distmat=True)

print(s_g) # 7 nodes print(s_g.graph['dist_mat']) # all original 760 nodes?

fig = plot_distance_matrix(s_g) # this line gives the error fig.write_html("visual.html") Thanks in advance, and apologies if i'm missing something.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.