GraSP-toolbox / GraSP

GraSP: a Graph Signal Processing toolbox for Matlab
Other
35 stars 19 forks source link

Consider decoupling layout #3

Closed iglesias closed 1 year ago

iglesias commented 1 year ago

First of all, thanks for the repository.

As a suggestion, I think it would be advantageous to remove a visualization concern out of the algebraic type. Is there any reason why it was designed to be an intrinsic property of the graph?

bgirault-inria commented 1 year ago

What kind of drawback do you see for these fields? or what kind of advantage for moving them out ?

Essentially, any field in grasp_struct intended for plotting the graph is optional, and default values are used if not set. For example, for the layout field, grasp_show_graph generates a new layout if it is missing. Note that except for A_layout, these fields can be overwritten using optional parameters to grasp_show_graph.

In addition, I haven't come across a cost for these fields, when they are set. Even copying the graph is efficient due to the CoW memory management mechanism in Matlab.

Also, some functions generating graphs naturally generate a layout associated to it. Including it in the structure makes it transparent to the user and avoids having to deal with several data structures.

iglesias commented 1 year ago

I see that moving them out would improve separation of concerns.

Ah, interesting that it does not affect even copying.

In the plotting functionality it could, of course, also be implemented transparently and without such additional overhead.

bgirault-inria commented 1 year ago

Sounds like you are talking about the S in the SOLID design principles in OOP: Single responsibility. There are 2 reasons for taking a step back on these principles: (i) Matlab is not a full featured OOP language and the GraSP toolbox does not leverage OOP, and (ii) SOLID are only guidelines and it is ill-advised to strictly adhere to them.

In the case of the graph structure, it is only a collection of fields, with methods implemented elsewhere: these are not objects. As such, having multiple layers of structures just to adhere to the single responsibility guideline would make the toolbox obscure and hard to use by the signal processing community. The goal of the current design is to strike a balance between ease of use, and clean code.

At the moment, these fields do not bring additional overhead. Please provide a MWE if you find an overhead.

bgirault-inria commented 1 year ago

Sounds like an older version (from the 70s) of single responsibility (from 2004). In any case, as stated in the wikipedia article you cite: "so despite the many benefits of well-separated concerns, there is often an associated execution penalty". This is exactly my point: don't get caught up in the guidelines, or what you learnt in class (and I say that as a former professor in OOP) or on Youtube, and look at the particular piece of software we are talking about: GraSP is a small toolbox, and it will never get much bigger than it is already. Being strict about theses guidelines would only achieve 2 things: making the program more complex and actually less efficient (overhead will actually increase since the guidelines are not about computationally efficient code but about maintainability), and having fewer users.

If we get caught up in single responsibility, we would end up splitting all elements of a the adjacency matrix in separate data structures, because each is responsible of a single piece of information: the weight of the associated edge. However, doing so, we would loose all possible optimizations Matlab has on manipulating matrices and vectors. This is of course a silly example, but it goes to show that you have to think carefully at where you draw the line on where to stop implementing the guideline. Sometimes, there are arguments for drawing the line at some other point. However, in the case of GraSP, I stand by my arguments on keeping it as is. We can of course debate them, even if you don't have a MWE, but you really have to give more details.

bgirault-inria commented 12 months ago

Obviously, without needing a MWE, storing the layout alongside the adjacency matrix constrains the largest graph the library can handle. Providing there's no sparsity or is not leveraged, the memory cost is quadratic with the number of nodes and with GraSP there's an additional layout field whose size also grows with the number of nodes. But I do not think people are using GraSP for big data, so this should not really be an issue on practice other than bounding to smaller graphs.

This is the type of content you should have written in the original issue, since your original one was very cryptic, and, honestly, wasted my time.

Here, there are a couple of things that become clear due to the fact that you probably haven't used Matlab or GraSP that much. First of all, as I said before, layout is an optional field, therefore it does not constraint the size of the graph. Check the documentation of grasp_struct and the code of grasp_show_graph. You should also check other graphs, such as grasp_erdos_renyi where it is clear that layout is optional. Even then, storing the full adjacency matrix may be indeed quadratic, but, you failed to see that the layout size is only linear, so the cost of adding the layout is negligible compared to a quadratic cost. If anything, you should be looking at the fields M, Z, Fand Finv for a memory complexity analysis. But, for these, what's the point of doing GSP without a GFT? these fields are optional anyway...

As for your comment on sparsity, there is no point in leveraging a sparsity pattern of a clique (a graph that no user of GraSP should be using anyway). However, if we are dealing with big data, which you are right is not the target of this toolbox, nor is it the target of GSP in general, sparsity becomes important but storage cost is not quadratic. This is especially easy to do in Matlab, so one cannot assume that matrix storage cost is quadratic as your comment suggests. And GraSP is not making this assumption. In that case, memory cost is still dominated by the adjacency matrix, but, again, layout is an optional field, such that there is absolutely no constraint on the size of the graph from storing layout in the structure.

Remember to be mindful of developers' time when you post issues. And be mindful of what you find "obvious", these may be oversights on your part, as is the case here.