72 adds a script to benchmark the partitioning. The idea is the following:

1) Load data using spark-fits (10 millions) 2) Apply partitioning or not to the RDD 3) Trigger an action, and repeat this several times (put in cache data at the first time)

Just printing the number of elements of the repartitioned RDD:

    // Load the data
    val options = Map("hdu" -> hdu)
    val pRDD = new Point3DRDD(spark, fn_fits, columns, isSpherical, "fits", options)

    // Partition it
    val rdd = mode match {
        case "nopart" => pRDD.rawRDD.cache()
        case "octree" => pRDD.spatialPartitioning(GridType.OCTREE).cache()
        case "onion" => pRDD.spatialPartitioning(GridType.LINEARONIONGRID).cache()
        case _ => throw new AssertionError("Choose between nopart, onion, or octree for the partitioning.")
    }

    // MC it to minimize flukes
    for (i <- 0 to 2) {
      val number = rdd.count()
      println(s"Number of points ($mode) : $number")
    }

I obtain:

Number of points (nopart) : 10000000
Number of points (octree) : 9999995
Number of points (onion) : 10000000

Weird?

mayurdb commented 6 years ago

Created #76 for resolving this issue

JulienPeloton commented 6 years ago

Checked!

Number of points (octree) : 10000000

astrolabsoftware / spark3D

On the partitioning: Octree partitioning does not keep all elements #75

72 adds a script to benchmark the partitioning. The idea is the following: