apache / incubator-wayang

Apache Wayang(incubating) is the first cross-platform data processing system.
https://wayang.incubator.apache.org/
Apache License 2.0
174 stars 70 forks source link

Fix CardinalityRepository to sample measured cardinalities #412

Closed juripetersen closed 4 months ago

juripetersen commented 4 months ago

Closes #411

juripetersen commented 4 months ago

Desirable output can resemble this:

{"inputs":[{"name":"in","index":0,"isBroadcast":false,"lowerBound":206,"upperBound":206,"confidence":1.0}],"operator":{"class":"org.apache.wayang.spark.operators.SparkFlatMapOperator"},"output":{"name":"out","index":0,"cardinality":1759}}
{"inputs":[{"name":"in","index":0,"isBroadcast":false,"lowerBound":206,"upperBound":206,"confidence":1.0}],"operator":{"class":"org.apache.wayang.basic.operators.FlatMapOperator"},"output":{"name":"out","index":0,"cardinality":1759}}
{"inputs":[{"name":"in","index":0,"isBroadcast":false,"lowerBound":206,"upperBound":206,"confidence":1.0}],"operator":{"class":"org.apache.wayang.java.operators.JavaFlatMapOperator"},"output":{"name":"out","index":0,"cardinality":1759}}
{"inputs":[{"name":"in","index":0,"isBroadcast":false,"lowerBound":1759,"upperBound":1759,"confidence":1.0}],"operator":{"class":"org.apache.wayang.basic.operators.FilterOperator"},"output":{"name":"out","index":0,"cardinality":1611}}
{"inputs":[{"name":"in","index":0,"isBroadcast":false,"lowerBound":1759,"upperBound":1759,"confidence":1.0}],"operator":{"class":"org.apache.wayang.spark.operators.SparkFilterOperator"},"output":{"name":"out","index":0,"cardinality":1611}}
{"inputs":[{"name":"in","index":0,"isBroadcast":false,"lowerBound":1759,"upperBound":1759,"confidence":1.0}],"operator":{"class":"org.apache.wayang.java.operators.JavaFilterOperator"},"output":{"name":"out","index":0,"cardinality":1611}}
{"inputs":[{"name":"in","index":0,"isBroadcast":false,"lowerBound":1611,"upperBound":1611,"confidence":1.0}],"operator":{"class":"org.apache.wayang.basic.operators.MapOperator"},"output":{"name":"out","index":0,"cardinality":1611}}
{"inputs":[{"name":"in","index":0,"isBroadcast":false,"lowerBound":1611,"upperBound":1611,"confidence":1.0}],"operator":{"class":"org.apache.wayang.spark.operators.SparkMapOperator"},"output":{"name":"out","index":0,"cardinality":1611}}
{"inputs":[{"name":"in","index":0,"isBroadcast":false,"lowerBound":1611,"upperBound":1611,"confidence":1.0}],"operator":{"class":"org.apache.wayang.java.operators.JavaMapOperator"},"output":{"name":"out","index":0,"cardinality":1611}}
{"inputs":[{"name":"in","index":0,"isBroadcast":false,"lowerBound":1611,"upperBound":1611,"confidence":1.0}],"operator":{"class":"org.apache.wayang.basic.operators.ReduceByOperator"},"output":{"name":"out","index":0,"cardinality":493}}
{"inputs":[{"name":"in","index":0,"isBroadcast":false,"lowerBound":1611,"upperBound":1611,"confidence":1.0}],"operator":{"class":"org.apache.wayang.spark.operators.SparkReduceByOperator"},"output":{"name":"out","index":0,"cardinality":493}}
{"inputs":[{"name":"in","index":0,"isBroadcast":false,"lowerBound":1611,"upperBound":1611,"confidence":1.0}],"operator":{"class":"org.apache.wayang.java.operators.JavaReduceByOperator"},"output":{"name":"out","index":0,"cardinality":493}}

Requires configuration like this:

Configuration config = new Configuration();
config.setProperty("wayang.core.log.enabled", "true");
config.setProperty("wayang.core.log.cardinalities", filePath);
config.setProperty("wayang.core.optimizer.instrumentation", "org.apache.wayang.core.profiling.FullInstrumentationStrategy");