Issues stitching .h5 dataset using SGE

Hello, I am attempting to run Pairwise Stitching on an SGE cluster. I realize that the code is untested (at least according to the online tutorial) but I have it working well on our cluster with the example data (N5/XML). It seems that some of the tasks are being executed, since I see pairwise shift output in the worker stdout. However, the driver issues SIGKILL to my workers, after they run into the following error:

"2024-07-30 15:06:43,468 [Executor task launch worker for task 6.0 in stage 0.0 (TID 6)] WARN [BlockManager]: Putting block rdd_1_6 failed due to exception java.lang.NullPointerException: Cannot invoke "Object.getClass()" because "e" is null. 2024-07-30 15:06:43,468 [Executor task launch worker for task 6.0 in stage 0.0 (TID 6)] WARN [BlockManager]: Block rdd_1_6 could not be removed as it was not found on disk or in memory 2024-07-30 15:06:43,470 [Executor task launch worker for task 6.0 in stage 0.0 (TID 6)] ERROR [Executor]: Exception in task 6.0 in stage 0.0 (TID 6) java.lang.NullPointerException: Cannot invoke "Object.getClass()" because "e" is null at net.preibisch.mvrecon.process.interestpointregistration.pairwise.constellation.grouping.Group.combineOrSplitBy(Group.java:232) at net.preibisch.mvrecon.process.interestpointregistration.pairwise.constellation.grouping.Group.combineBy(Group.java:170) at net.preibisch.stitcher.algorithm.GroupedViewAggregator$Action.pickBrightest(GroupedViewAggregator.java:119) at net.preibisch.stitcher.algorithm.GroupedViewAggregator$Action.aggregate(GroupedViewAggregator.java:102) at net.preibisch.stitcher.algorithm.GroupedViewAggregator.aggregate(GroupedViewAggregator.java:396) at net.preibisch.stitcher.algorithm.globalopt.TransformationTools.computeStitching(TransformationTools.java:277) at net.preibisch.bigstitcher.spark.SparkPairwiseStitching.lambda$call$c9354d04$1(SparkPairwiseStitching.java:200) at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:224) at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:302) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1597) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1524) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1588) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1389) at org.apache.spark.storage.BlockManager.getOrElseUpdateRDDBlock(BlockManager.scala:1343) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:379) at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583)"

I suspect that I may need to refine the spark setup to better integrate with my cluster configuration, but I can't tell if this is a memory error, or an issue with my input data. I can provide any other logs that may be helpful.

Thanks!

Edit: It may be worth noting that for this dataset, Pairwise Stitching runs fine using interactive BigStitcher with FIJI.

JaneliaSciComp / BigStitcher-Spark

Issues stitching .h5 dataset using SGE #35