OS: CentOS Linux release 7.4.1708 (Core)
spark3D: 0.1.4
spark-fits: 0.6.0
72 adds a script to benchmark the partitioning. The idea is the following:
1) Load data using spark-fits (10 millions)
2) Apply partitioning or not to the RDD
3) Trigger an action, and repeat this several times (put in cache data at the first time)
Just printing the number of elements of the repartitioned RDD:
// Load the data
val options = Map("hdu" -> hdu)
val pRDD = new Point3DRDD(spark, fn_fits, columns, isSpherical, "fits", options)
// Partition it
val rdd = mode match {
case "nopart" => pRDD.rawRDD.cache()
case "octree" => pRDD.spatialPartitioning(GridType.OCTREE).cache()
case "onion" => pRDD.spatialPartitioning(GridType.LINEARONIONGRID).cache()
case _ => throw new AssertionError("Choose between nopart, onion, or octree for the partitioning.")
}
// MC it to minimize flukes
for (i <- 0 to 2) {
val number = rdd.count()
println(s"Number of points ($mode) : $number")
}
I obtain:
Number of points (nopart) : 10000000
Number of points (octree) : 9999995
Number of points (onion) : 10000000
OS: CentOS Linux release 7.4.1708 (Core) spark3D: 0.1.4 spark-fits: 0.6.0
72 adds a script to benchmark the partitioning. The idea is the following:
1) Load data using spark-fits (10 millions) 2) Apply partitioning or not to the RDD 3) Trigger an action, and repeat this several times (put in cache data at the first time)
Just printing the number of elements of the repartitioned RDD:
I obtain:
Weird?