Closed perellonieto closed 7 years ago
Nice - something we've talked about doing for ages! Some comments:
I can't quite decide whether this belongs in the viewer or the core, since it's not an obvious match for either. In the core it means extra dependencies, but the viewer is really just a web-app. Although for sure the web-app should be able to display these if they're available.
There is a missing element to make this work nicely, which is that currently the streams don't store back-references to the tool that created them, so you can't actually traverse back up the graph. I had a look and there is a function set_tool_reference()
but it's currently not called anywhere
we tend to draw things as factor graphs: variables in the graph would be what we call nodes, and factors are factors. Note that a node is simply a collection of streams that only differ by their meta data (plate values), and a factor has a tool and has the logic to operate over plates. The examples we have so far are kind of outside of this, but you can think of a stream->tool chain as the same as a node->factor chain when there is no surrounding plate (and hence only one stream in the plate). At any rate, it would be nice to draw it as a factor graph, which is one of the reasons we hadn't jumped at graphviz as it's not easy to draw them using that. I did wonder about somehow using tikz Bayesnet which is how we draw our factor graphs in papers but this is clearly harder to automate
I kind of see what you were doing by putting the data destinations at the beginning and end of the chain, but really every stream can basically live in any location (for example you could swap to channels for sea_ice_stream
and sea_ice_sum_stream
and your code would execute in the same way, except that different data is stored in the database. I wonder that you could color code the nodes instead to indicate the channel? The square box for CSV file is still valid though
saying that the tools are in memory is a bit of a weird one: yes the tools are loaded into memory, but of course they originated from the tool channel, which is a file channel. However that's probably something that readers at this stage don't really need to know about, so I would probably just take that out of the table
I think that it would be nice to be able to print all the elements involved in a composition of streams. If we use Graphviz, it would be possible to create the directed graph given that we know all the information about the nodes. Then, it is possible to output the graph in a variety of formats that are supported by Graphviz, like bitmaps, vectorial images, json, plain text, and including direct compatibility with Jupyter Notebook.
Here is an example of a portion of the tutorial that I am writing. In this case, being able to visualize the graph would be beneficial to understand how it works.