Closed thejmazz closed 7 years ago
See the current JSON representation here. Note the duplicated logging of nodes if they are children of other nodes. This JSON graph structure is probably not ideal, or at least, should be created from another structure of { nodes: [], edges: []}
.
It could be useful to use a more standard graph format. The most ideal would be:
Some links:
Even OBO could work: this could have nodes for a file, which might have an edge "created_by" and "used_by" to different task nodes
We now have this kind of structure and a simple graph visualization with d3, available at localhost:8084 when watermill is running. Though, it still lacks operationString
.
Shall we close this? Of course graph visualization can be further improved but for now we have a simple DAG visualization tool.
Yes lets close for now. We can always make an issue for more specific improvements.
It is useful to have a visual representation of the Directed Acyclic Graph (DAG) that is produced during the execution of a pipeline.
In the graph,
|
as the child node of two parent nodes. TODO actual graph diagramsjoin(A, B)
creates the DAGa ---> b
junction(A, B)
creates the DAGjoin(junction(A, B), C)
creates the DAGjoin(A, fork(X, Y), C)
creates the DAGThe redux reducer for the DAG is here. It uses graph.js.
The graph exists in the store under the path
collection
(i.e. a valid selector would be(state) => state.collection
.A function jsonifyGraph is also exported. This is because the graph object from graph.js is not serializable. This creates a serializable JSON representation of the graph.
See here how the collection (aka DAG) is logged out during task resolution for debug.
A first implementation of this could be to write the JSON graph to disk during the pipeline execution, overwriting the previous file whenever a
ADD_OUTPUT
orADD_JUNCTION_VERTEX
actions have been dispatched (i.e. whenever the state of the DAG changes). This way if a task fails, at least we have the last best graph stored.Then it is a matter of parsing that JSON into a visualization using something like d3.
Suggestions to improve the way the graph is handled within watermill are welcome. Perhaps there is a better serializable format to use (e.g. graphml format).
BONUS