72 adds a script to benchmark the partitioning. The idea is the following:

1) Load data using spark-fits (10 millions) 2) Apply partitioning or not to the RDD 3) Trigger an action, and repeat this several times (put in cache data at the first time)

Once the data is repartitioned and put in cache, one expects the later iteration to be fast. While later iterations are both faster in onion and octree, they are slower in the latter than in the former:

Onion:

Job Id	Description	Duration
3	count at Partitioning.scala:81	0.2 s
2	count at Partitioning.scala:81	0.2 s
1	count at Partitioning.scala:81	1.0 min

Octree:

Job Id	Description	Duration
3	count at Partitioning.scala:81	6 s
2	count at Partitioning.scala:81	5 s
1	count at Partitioning.scala:81	1.2 min

Is that expected?

mayurdb commented 6 years ago

Taking a look!

JulienPeloton commented 6 years ago

Seems to be a fluke -- recent benchmarks are not seeing this effect anymore. Closing.

astrolabsoftware / spark3D

On the partitioning: Later iterations in Octree are slower than expected #74

72 adds a script to benchmark the partitioning. The idea is the following: