AxFoundation / strax

Stream analysis for xenon TPCs
BSD 3-Clause "New" or "Revised" License
27 stars 38 forks source link

Plugin tree depth, new context propperty #439

Open JoranAngevaare opened 3 years ago

JoranAngevaare commented 3 years ago

For several applications it would be nice to have an index of "how deep" a plugin is.

For example for the datastructure in the straxen documentation we need such a depth to build the plugin tree: https://straxen.readthedocs.io/en/latest/reference/datastructure_nT.html. Another example is bootstrax, where we would like to sort the plugins to have the highest level targets first.

One complication might be that a plugin tree might not might not (in fact amost never) has one single final plugin. As such the "tree" might be ambiguous and a forest might be the better description.

jmosbacher commented 3 years ago

@jorana im not sure i fully follow what it is you mean by "how deep" a plugin is but maybe i can help. I can think of two things you may be referring. Maybe the DAG shortest path between a given data type and a root? or the length of a topological sort of the DAG? Both would be pretty trivial to add using networkx or pythons builtin graphlib.

try this:

import networkx as nx

dag = nx.DiGraph()

# Here i add all dependencies in the registry as an example, but i guess you would want to only add the dependency tree of a single plugin        
for name, plugin in st._plugin_class_registry.items():
    for p in plugin().depends_on:
        dag.add_edge(plugin.__name__, st._plugin_class_registry[p].__name__)

# Display the DAG
nx.draw(
    dag,
    with_labels=True,
    pos=nx.drawing.nx_agraph.graphviz_layout(
        dag, prog='dot',),
 node_size=800,)

# Shortest path
nx.shortest_path(dag, "Events", "DAQReader", )

# Topological sort
list(nx.topological_sort(dag))

Note that the topological sort is not unique, it represents a possible order in which you can process the DAG for which all dependencies are completed before dependents. If you want to define a unique "depth" you should look into the nx.lexicographical_topological_sort where you can pass a function that breaks the ties when more than one ordering is possible.

Is this the kind of thing you were thinking of?

JoranAngevaare commented 3 years ago

Hi Yossi, very nice example! Yes I think the enumerated output of list(nx.lexicographical_topological_sort (dag)) would be what I was thinking of, that way, one can easily do compare another the order of plugins. You don't need to have an unique solution (e.g. I don't care if events_mv > events_nv as long as I can figure out that events_nv > raw_records_nv).

So this would perfectly usable:

{p: i for p, i in enumerate(list(nx.lexicographical_topological_sort (dag)))}
{0: 'EventInfoDouble',
 1: 'DistinctChannels',
 2: 'EventInfo',
 3: 'EnergyEstimates',
...
 37: 'nVETORecorder',
 38: 'DAQReader'}

Perhaps this is much easier than what you thought I meant? I was thinking of writing this down in a simple dict / list computation but this is much more than that.

By the way for straxen, we use a similar package, graphviz. https://github.com/XENONnT/straxen/blob/master/docs/source/build_datastructure_doc.py#L174