astrolabsoftware / spark3D

Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
https://astrolabsoftware.github.io/spark3D/
Apache License 2.0
30 stars 16 forks source link

Octree partitioning not working from spatialPartitioning #41

Closed JulienPeloton closed 6 years ago

JulienPeloton commented 6 years ago

I tried this simple script:

import com.spark3d.spatial3DRDD._
import org.apache.spark.sql.SparkSession
import com.spark3d.utils.GridType

val spark = SparkSession.builder().appName("OctreeSpace").getOrCreate()

val fn = "src/test/resources/cartesian_spheres.fits"
val hdu = 1
val columns = "x,y,z,radius"
val spherical = false

// Load the data
val sphereRDD = new SphereRDDFromFITS(spark, fn, hdu, columns, spherical)

// Re-partition the space using OCTREE
val sphereRDD_part = sphereRDD.spatialPartitioning(GridType.OCTREE)

and here is the log

org.apache.spark.SparkDriverExecutionException: Execution error
  at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1206)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1729)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1687)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1676)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2050)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2069)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2094)
  at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
  at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
  at org.apache.spark.rdd.RDD$$anonfun$takeSample$1.apply(RDD.scala:578)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
  at org.apache.spark.rdd.RDD.takeSample(RDD.scala:557)
  at com.spark3d.spatial3DRDD.Shape3DRDD.spatialPartitioning(Shape3DRDD.scala:113)
  ... 50 elided
Caused by: java.lang.ArrayStoreException
lastException: Throwable = null

I tested it before and after the big change of https://github.com/JulienPeloton/spark3D/pull/40, so likely the problem is coming from before. Any ideas @mayurdb before I dig in the code?

JulienPeloton commented 6 years ago

Done in #56 and #45