apache / incubator-wayang

Apache Wayang(incubating) is the first cross-platform data processing system.
https://wayang.incubator.apache.org/
Apache License 2.0
184 stars 73 forks source link

Add explain command to display debug information for inflated wayang plans and execution plans #421

Closed juripetersen closed 6 months ago

juripetersen commented 6 months ago

References #417

Users can define WayangPlans as usual, but instead of executing them, they can now use an explain command:

// Debug into console
WayangContext.explain(wayangPlan, udfJars);

// Print to json files for later usage:

WayangContext.explain(wayangPlan, true, udfJars);

Example debug to console for WordCount:

== Wayang Plan ==
-+ [Alternative[[LocalCallbackSink[Collect result]]], Alternative[[JavaLocalCallbackSink[Collect result]]], Alternative[[SparkLocalCallbackSink[Collect result]]]]
  -+ [Alternative[[ReduceBy[Add counters]]], Alternative[[JavaReduceBy[Add counters]]], Alternative[[SparkReduceBy[Add counters]]]]
    -+ [Alternative[[Map[To lower case, add counter]]], Alternative[[JavaMap[To lower case, add counter]]], Alternative[[SparkMap[To lower case, add counter]]]]
      -+ [Alternative[[Filter[Filter empty words]]], Alternative[[JavaFilter[Filter empty words]]], Alternative[[SparkFilter[Filter empty words]]]]
        -+ [Alternative[[FlatMap[Split words]]], Alternative[[JavaFlatMap[Split words]]], Alternative[[SparkFlatMap[Split words]]]]
          -+ [Alternative[[TextFileSource[Load file]]], Alternative[[JavaTextFileSource[Load file]]], Alternative[[SparkTextFileSource[Load file]]]]

== Execution Plan ==
-+ JavaLocalCallbackSink[Collect result]
  -+ JavaReduceBy[Add counters]
    -+ JavaMap[To lower case, add counter]
      -+ JavaFilter[Filter empty words]
        -+ JavaFlatMap[Split words]
          -+ JavaTextFileSource[Load file]

The class ExplainUtils was added and allows to retrieve ExplainTreeNodes that can be used for further computations (maybe other file formats etc.). This class allows to specify upstream or downstream traversal of plans so that users can build trees that fit their needs.

2pk03 commented 6 months ago

@juripetersen , could you also please write the documentation parts and maybe a blogpost?

juripetersen commented 6 months ago

@juripetersen , could you also please write the documentation parts and maybe a blogpost?

Sure, will do so after successfully merging this.