epfl-lts2 / pygsp

Graph Signal Processing in Python
https://pygsp.rtfd.io
BSD 3-Clause "New" or "Revised" License
488 stars 93 forks source link

2d plotting with a 3d coordinate set# #26

Open stanleyjs opened 6 years ago

stanleyjs commented 6 years ago

I would like to plot only two of a graph's dimensions. Oftentimes this is desirable because working in 3D can be hard to wrap one's head around. This would be something like a keyword argument g.plot(dims=[1,2]) or such.

I poked my head in the plotting code and it appears that the bulk of the logic in the _plt* and _qtg* functions are built atop calls to _get_coords(G) and conditionals on G.coords.shape. This makes it a little bit difficult to just insert a dims parameter. It could be done, but it'd require a refactor. One hacky option is to alter _plot_graph and _plot_signal (and their upstream methods in graphs.py), to have dims=None as a keyword parameter and have

tmp = G.coords 
if dims and len(dims)<G.coords.shape[1]: 
     G.set_coordinates(tmp[:,dims])
## normal plotting (backend call)
G.set_coordinates(tmp)

Note that this doesn't consume any extra memory due to assignment by reference.

mdeff commented 6 years ago

Thanks for your suggestion @stanleyjs. I agree with your point, and in general graphs might even be embedded in more that 3D.

Then, is it better to:

  1. have a keyword argument which selects dimensions (as you propose),
  2. let the user set G.coords as he wants.

(1) assumes that G.coords contains an embedding of the graph in any dimension, and (2) assumes that G.coords is specifically the 1-2-3 D embedding that will be used for plotting. With (2), a user might set a G.embedding matrix with the full n-dimensional embedding and then call G.set_coordinates(G.embedding[:,dims]).

What's your thoughts?

stanleyjs commented 6 years ago

So as it stands (2) is essentially already in place, we just don't have a G.embedding attribute. It's not a problem, but I do think that shuffling around the coordinates using calls to G.set_coordinates(foo[:,slice]) is a little cumbersome. As for positives: (2) does solve an issue with keyword arguments for N-dimensional embeddings - setting the default slice. (2) also does not need any new code, and we won't have to change the dimension checking in graphs.Graph.set_coordinates().

So (1) requires new plotting logic, new dimension checking logic in set_coordinates, and some decision on what the default should be (the obvious choice is [0,1,2]). On the other hand, the call to plot a specific set of dimensions is one line rather than 2. If one wants to plot a set of views of an embedding, this does start to matter. Finally, it does seem stylistically more organic to me to be able to slice with the plotting function - one can do this inline with plt.scatter after all, it just doesn't have the nice pygsp wrapping.

I will note that adding the keyword arg (1) does not completely abrogate (2).

mdeff commented 6 years ago

I hear your arguments. If we can have a nice implementation and not a hack that'll add to our technical debt I'd be willing to accept a PR.

I think the default should be to plot with all the dimensions available in G.coords, and throw an error if that's more than 3.

While doing that, we could add an ndim argument to set_coordinates to avoid duplicates like random2D and random3D, and generate a random embedding in ndim dimensions.

Passing dims=[1, 0, 2] to the plot would also exchange the x and y axes right?