LucidDB / luciddb

DEFUNCT: See README
https://github.com/LucidDB/luciddb
Apache License 2.0
53 stars 24 forks source link

[FNL-59] ExecStreamGraph doesn't get built with correct flow output ordering #443

Open dynamobi-build opened 12 years ago

dynamobi-build commented 12 years ago

[reporter="jvs", created="Mon, 4 Dec 2006 17:46:36 -0500 (GMT-05:00)"] Up until now, ExecStreamBuilder only preserves input flow ordering (even though Farrago and FEM transmit both input and output flow ordering to Fennel). Zelaine is putting in a stopgap to make it handle the case of preserving output flow ordering for streams which have only one input. For edges connecting streams where the producer has multiple outputs and the consumer has multiple inputs, the output ordering will be set correctly, and the input ordering may not be (whereas it would have been before).

A full solution requires changes to ExecStreamGraph's data structures. What we'd like to do is use the original ExecStreamBuilder logic (add edges according to input flow ordering) but annotate each edge with the correct output flow ordinal. Then, during ExecStreamGraphImpl::prepare, sort each vertex's list of output edges according to the annotated ordinal attribute.

From reading the Boost docs, I haven't been able to figure out whether it's actually safe/possible to use std::sort on the iterator pair returned by boost::out_edges (with a comparator to impose the order transmitted down from Java, which we would need to keep associated with the edge before the sort).

http://www.boost.org/libs/graph/doc/adjacency_list.html

The closest mention is here (section Structure Modification):

"The placement of the new edge in the out-edge list is in general unspecified, though ordering of the out-edge list can be accomplished through the choice of OutEdgeList."

I.e. our usage of std::vector for the edge lists will maintain insertion order, but that doesn't answer whether reordering existing edge lists will work (and reading the template code is not that easy).

I verified that it's at least possible to compile code which uses the returned iterators to swap edges, but I don't know if it will corrupt other data structures inside the graph.

If it doesn't work, then we would probably need to maintain redundant sorted edge vectors per-vertex and use those for output-order-sensitive graph accessors.

Either way, we also need to take into account how this interacts with post-prepare graph edits used by Disruptive Tech.

dynamobi-build commented 12 years ago

[author="mberkowitz", created="Mon, 4 Dec 2006 18:29:10 -0500 (GMT-05:00)"] I presume that the order of inputs and/or ouputs only matters when the ExecStream treats the inputs or outputs differently, and infers which is which by its ordinal postion in the input or output set.
When DT merges or unmerges stream graphs, the junction vertices are currently always GlobalOrderedNexuses,
which treat all inputs the same, and all outputs the same, so there doesnt seem to be a problem here.

dynamobi-build commented 12 years ago

[author="jvs", created="Fri, 12 Jan 2007 11:35:41 -0500 (GMT-05:00)"] My recent graph structure change for implicit dependencies involved adding an edge weight to represent implicit vs explicit. Currently the weight is 0 for implicit and 1 for explicit, but we could use a positive weight to represent 1-based explicit edge order (or change implicit to -1 and use 0-based). Implicit dataflow edges should never need ordering.

dynamobi-build commented 12 years ago

[author="jvs", created="Fri, 12 Jan 2007 13:50:05 -0500 (GMT-05:00)"] Using the weight property is an abuse; it's likely to cause trouble if someone tries to use a graph algorithm which is weight-dependent (like shortest path). I should brave the templates and figure out how to introduce my own user-defined property.