astrolabsoftware / spark3D

Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
https://astrolabsoftware.github.io/spark3D/
Apache License 2.0
30 stars 16 forks source link

On the partitioning: Later iterations in Octree are slower than expected #74

Closed JulienPeloton closed 6 years ago

JulienPeloton commented 6 years ago

OS: CentOS Linux release 7.4.1708 (Core) spark3D: 0.1.4 spark-fits: 0.6.0

72 adds a script to benchmark the partitioning. The idea is the following:

1) Load data using spark-fits (10 millions) 2) Apply partitioning or not to the RDD 3) Trigger an action, and repeat this several times (put in cache data at the first time)

Once the data is repartitioned and put in cache, one expects the later iteration to be fast. While later iterations are both faster in onion and octree, they are slower in the latter than in the former:

Onion:

Job Id Description Duration
3 count at Partitioning.scala:81 0.2 s
2 count at Partitioning.scala:81 0.2 s
1 count at Partitioning.scala:81 1.0 min

Octree:

Job Id Description Duration
3 count at Partitioning.scala:81 6 s
2 count at Partitioning.scala:81 5 s
1 count at Partitioning.scala:81 1.2 min

Is that expected?

mayurdb commented 6 years ago

Taking a look!

JulienPeloton commented 6 years ago

Seems to be a fluke -- recent benchmarks are not seeing this effect anymore. Closing.