JaneliaSciComp / BigStitcher-Spark

Running compute-intense parts of BigStitcher distributed
BSD 2-Clause "Simplified" License
17 stars 10 forks source link

Issues stitching .h5 dataset using SGE #35

Open vbrow29 opened 2 months ago

vbrow29 commented 2 months ago

Hello, I am attempting to run Pairwise Stitching on an SGE cluster. I realize that the code is untested (at least according to the online tutorial) but I have it working well on our cluster with the example data (N5/XML). It seems that some of the tasks are being executed, since I see pairwise shift output in the worker stdout. However, the driver issues SIGKILL to my workers, after they run into the following error:

"2024-07-30 15:06:43,468 [Executor task launch worker for task 6.0 in stage 0.0 (TID 6)] WARN [BlockManager]: Putting block rdd_1_6 failed due to exception java.lang.NullPointerException: Cannot invoke "Object.getClass()" because "e" is null. 2024-07-30 15:06:43,468 [Executor task launch worker for task 6.0 in stage 0.0 (TID 6)] WARN [BlockManager]: Block rdd_1_6 could not be removed as it was not found on disk or in memory 2024-07-30 15:06:43,470 [Executor task launch worker for task 6.0 in stage 0.0 (TID 6)] ERROR [Executor]: Exception in task 6.0 in stage 0.0 (TID 6) java.lang.NullPointerException: Cannot invoke "Object.getClass()" because "e" is null at net.preibisch.mvrecon.process.interestpointregistration.pairwise.constellation.grouping.Group.combineOrSplitBy(Group.java:232) at net.preibisch.mvrecon.process.interestpointregistration.pairwise.constellation.grouping.Group.combineBy(Group.java:170) at net.preibisch.stitcher.algorithm.GroupedViewAggregator$Action.pickBrightest(GroupedViewAggregator.java:119) at net.preibisch.stitcher.algorithm.GroupedViewAggregator$Action.aggregate(GroupedViewAggregator.java:102) at net.preibisch.stitcher.algorithm.GroupedViewAggregator.aggregate(GroupedViewAggregator.java:396) at net.preibisch.stitcher.algorithm.globalopt.TransformationTools.computeStitching(TransformationTools.java:277) at net.preibisch.bigstitcher.spark.SparkPairwiseStitching.lambda$call$c9354d04$1(SparkPairwiseStitching.java:200) at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:224) at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:302) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1597) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1524) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1588) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1389) at org.apache.spark.storage.BlockManager.getOrElseUpdateRDDBlock(BlockManager.scala:1343) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:379) at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583)"

I suspect that I may need to refine the spark setup to better integrate with my cluster configuration, but I can't tell if this is a memory error, or an issue with my input data. I can provide any other logs that may be helpful.

Thanks!

Edit: It may be worth noting that for this dataset, Pairwise Stitching runs fine using interactive BigStitcher with FIJI.

vbrow29 commented 6 days ago

For what it's worth, I think I have figured out the root cause of this issue. My image datasets are acquired from a hybrid open-top light-sheet microscope with a non-orthogonal dual objective lens. This causes the images to be saved with a different axis order than conventional microscopy images, as the illumination objective is oriented at 45° to the collection objective. What I did to fix the issue above was to load the dataset (h5 format) in the FIJI version of BigStitcher, and use the "Interactively Reorient Sample" function to reorient my sample to a more traditional ZYX axis setup. I'm not totally sure why this fixed my issue, but I'm putting this here in case anyone has a similar issue and is looking for a solution.