epfl-lts2 / pygsp

Graph Signal Processing in Python
https://pygsp.rtfd.io
BSD 3-Clause "New" or "Revised" License
488 stars 93 forks source link

Import & export #31

Closed mdeff closed 5 years ago

mdeff commented 6 years ago

Goals:

@cgallay will implement those features. Any feedback or comment from the community is welcome.

mdeff commented 6 years ago

API

I'd propose to introduce FromNetworkX and FromGraphTool as sub-classes of Graph, such that a pygsp graph can be constructed from a networkx graph as graph = pygsp.graphs.FromNetworkX(graph_nx). The to_networkx and to_graphtool methods of a Graph object would then return a graph object of the respective package. That is, graph_gt = graph.to_graphtool().

For the second step, another sub-class, FromFile, will accept a path to a file and use networkx or graph-tool in the backend to load the graph. That is, graph = pygsp.graphs.FromFile('path/to/graph.format'). A graph will be saved as graph.save('path/to/graph.format').

In addition to the graph, we should import/export one or many signals. Signals are sometimes part of a graph, and it's useful to save them together (e.g., for visualization). In networkx, signals can be represented as node attributes (weights are edge attributes). In graph-tool, signals can be represented as vertex property maps (weights are edge property maps). In both implementation, signals can be named. I'd thus propose to introduce a Graph.signals attribute, which is initialized as an empty dictionary by default. Users can either populate the dictionary before exporting/saving, or retrieve the saved signals by name when importing/loading.

Considered options

While it's clear that export and save should be implemented with methods, I was not sure if the import and load should be implemented in the constructor or as a factory. Options:

  1. graph = pygsp.graphs.Graph('/path/to/graph.format'): constructor accepts multiple objects, such as paths, numpy array (adjacency matrix), networkx and graph-tool graphs. That is convenient but the documentation might not be clear as Python lacks function overloading.
  2. graph = pygsp.graphs.Graph.load('/path/to/graph.format'): a @classmethod factory is used to build and return the object as the __init__ constructor would do. Each factory builds a Graph from a different object.
  3. graph = pygsp.graphs.load('/path/to/graph.format'): the factory is a module-level function instead of a class method. While that is more convenient for the user, it lacks a clear meaning that load will return a Graph object.
  4. graph = pygsp.graphs.FromFile('/path/to/graph.format'): sub-class the Graph base class to overload the constructor. That's more consistent as we do it this way for graph models already. It's also more OOP.

Another option was for the importing/loading functions to return a dictionary of signals instead of storing it as the graph.signals attribute. Exporting would have been done as graph.to_networkx(myfeature1=mysignal1, myfeature2=mysignal2). If those signals are to be exported and saved, it's however clearer if they are part of the Graph object.

Implementation

Networkx and graph-tool should be lazily imported as we don't want them as dependencies. That is, import them in the function definition not the module.

Unit tests should be created to test the implementation. I propose to do a round-trip (to networkx, graph-tool, various graph formats) and verify that the graph and signals are preserved.

Again, any suggestion is welcome.

cgallay commented 6 years ago

For FromNetworkX and FromGraphTool why do we need to do sub-classes? Could we go for simpler @classmethod factory instead? That we could call like that : graph = pygsp.graphs.Graph.from_networkx(graph_nx)

mdeff commented 6 years ago

We of course could. That's what I discussed in the "Considered options" above. The main argument for me to go with sub-classes is to be consistent. Given that we already have pygsp.graphs.Sensor() and the like for graph models, pygsp.graphs.FromNetworkX(graph_nx) is more consistent than pygsp.graphs.Graph.from_networkx(graph_nx). In the end all the sub-classes of Graph only exists to overload the constructor (be it to take it from networkx, to load it from a file, to generate a network from a model, etc.). All the methods (and therefore the interface), is implemented in Graph itself (or parent classes such as GraphFourier and GraphDifference, which are only mint to modularize the code). See also my StackOverflow answer.

cgallay commented 6 years ago

Here are some implementation details. For Graph Tool:

For Networkx:

mdeff commented 5 years ago

Implemented in #32 and #46.