Closed mdeff closed 5 years ago
I'd propose to introduce FromNetworkX
and FromGraphTool
as sub-classes of Graph
, such that a pygsp graph can be constructed from a networkx graph as graph = pygsp.graphs.FromNetworkX(graph_nx)
. The to_networkx
and to_graphtool
methods of a Graph
object would then return a graph object of the respective package. That is, graph_gt = graph.to_graphtool()
.
For the second step, another sub-class, FromFile
, will accept a path to a file and use networkx or graph-tool in the backend to load the graph. That is, graph = pygsp.graphs.FromFile('path/to/graph.format')
. A graph will be saved as graph.save('path/to/graph.format')
.
In addition to the graph, we should import/export one or many signals. Signals are sometimes part of a graph, and it's useful to save them together (e.g., for visualization). In networkx, signals can be represented as node attributes (weights are edge attributes). In graph-tool, signals can be represented as vertex property maps (weights are edge property maps). In both implementation, signals can be named. I'd thus propose to introduce a Graph.signals
attribute, which is initialized as an empty dictionary by default. Users can either populate the dictionary before exporting/saving, or retrieve the saved signals by name when importing/loading.
While it's clear that export and save should be implemented with methods, I was not sure if the import and load should be implemented in the constructor or as a factory. Options:
graph = pygsp.graphs.Graph('/path/to/graph.format')
: constructor accepts multiple objects, such as paths, numpy array (adjacency matrix), networkx and graph-tool graphs. That is convenient but the documentation might not be clear as Python lacks function overloading.graph = pygsp.graphs.Graph.load('/path/to/graph.format')
: a @classmethod
factory is used to build and return the object as the __init__
constructor would do. Each factory builds a Graph
from a different object.graph = pygsp.graphs.load('/path/to/graph.format')
: the factory is a module-level function instead of a class method. While that is more convenient for the user, it lacks a clear meaning that load
will return a Graph
object.graph = pygsp.graphs.FromFile('/path/to/graph.format')
: sub-class the Graph
base class to overload the constructor. That's more consistent as we do it this way for graph models already. It's also more OOP.Another option was for the importing/loading functions to return a dictionary of signals instead of storing it as the graph.signals
attribute. Exporting would have been done as graph.to_networkx(myfeature1=mysignal1, myfeature2=mysignal2)
. If those signals are to be exported and saved, it's however clearer if they are part of the Graph
object.
Networkx and graph-tool should be lazily imported as we don't want them as dependencies. That is, import them in the function definition not the module.
Unit tests should be created to test the implementation. I propose to do a round-trip (to networkx, graph-tool, various graph formats) and verify that the graph and signals are preserved.
Again, any suggestion is welcome.
For FromNetworkX
and FromGraphTool
why do we need to do sub-classes? Could we go for simpler @classmethod factory
instead? That we could call like that :
graph = pygsp.graphs.Graph.from_networkx(graph_nx)
We of course could. That's what I discussed in the "Considered options" above. The main argument for me to go with sub-classes is to be consistent. Given that we already have pygsp.graphs.Sensor()
and the like for graph models, pygsp.graphs.FromNetworkX(graph_nx)
is more consistent than pygsp.graphs.Graph.from_networkx(graph_nx)
. In the end all the sub-classes of Graph
only exists to overload the constructor (be it to take it from networkx, to load it from a file, to generate a network from a model, etc.). All the methods (and therefore the interface), is implemented in Graph
itself (or parent classes such as GraphFourier
and GraphDifference
, which are only mint to modularize the code). See also my StackOverflow answer.
Here are some implementation details. For Graph Tool:
The weight of the edges are stored as an edge property
therefore for export, the method to_graphtool(...) will take an argument edge_property_name
that will be used as a key for the edge property of the generated graphtool object. When importing a graph from graphtool the same argument will be used and the corresponding property will be used as a weight for the edges
Multiple edges are allowed in graphtool but not in pygsp therefore when importing a graph containing more than one edge between two vertexes I suggest that a warning is shown to the user letting him know that the sum was used to aggregate the multiple vertexes into one. For more flexibility, the user will have the possibility to set a custom aggregate function with the agg_mutli_edge_fn
argument.
For Networkx:
from_scipy_sparse_matrix
the attribute is attribute name is weight
. For export, the method to_networkx(...) will take an argument edge_property_name
(for consistency with graphtool) where the weight of the edges will be stored. When importing the user will have to choose which attribute he wants to use as weight (see)Please let me know if you like the implementation or give me feedback on how to improve. I will update this post with more implementation details as I code.
Implemented in #32 and #46.
Goals:
@cgallay will implement those features. Any feedback or comment from the community is welcome.