JaneliaSciComp / BigStitcher-Spark

Running compute-intense parts of BigStitcher distributed
BSD 2-Clause "Simplified" License
17 stars 10 forks source link

Solver does not converge in a detection, matching, solving sequence #25

Closed kgabor closed 4 months ago

kgabor commented 4 months ago

If I use IP_N5_XML.zip and do detection, matching, and solving; the solver fails to converge, and the solution is unusable. The solver gives residuals in the order of the dimensions of the dataset. However, if I use the IPs from IP_N5_XML_IPs.zip then do matching and solving, it succeeds.

In terms of number of IPs, RANSAC matches, there are minimal differences, also in bdv the interest points are visually the same in both cases.

Reproduction

As of 2024-04-16 main branch version installation of BigStitcher-Spark, using the IP_N5_XML.zip dataset, the problem can be reproduced with 2 tiles.

Failing scenario

~/BigStitcher-Spark/detect-interestpoints --label=beads -s 1.8 -t 0.008 --xml=dataset.xml --downsampleXY=2 -i0 0 -i1 255 -vi '18,0' -vi '18,1'
~/BigStitcher-Spark/match-interestpoints --clearCorrespondences --label=beads --method=FAST_ROTATION --xml=dataset.xml -vi '18,0' -vi '18,1'
~/BigStitcher-Spark/solver --xml=dataset.xml --sourcePoints=IP --label=beads -vi '18,0' -vi '18,1'

The solver log looks like this:

9997: 378.49471691893393 378.49471691890756
9998: 378.49471691893393 378.49471691890756
9999: 378.49471691893393 378.49471691890756
Concurrent tile optimization loop took 1087 ms, total took 1088 ms
Successfully optimized configuration of 2 tiles after 10000 iterations:
  average displacement: 378.495px
  minimal displacement: 378.495px
  maximal displacement: 378.495px

Succeeding scenario

~/BigStitcher-Spark/detect-interestpoints --label=beads -s 1.8 -t 0.008 --xml=dataset.xml --downsampleXY=2 -i0 0 -i1 255 -vi '18,0' -vi '18,1' --blockSize="1600,1600,1600"
~/BigStitcher-Spark/match-interestpoints --clearCorrespondences --label=beads --method=FAST_ROTATION --xml=dataset.xml -vi '18,0' -vi '18,1'
~/BigStitcher-Spark/solver --xml=dataset.xml --sourcePoints=IP --label=beads -vi '18,0' -vi '18,1'

The solver log looks like this:

200: 0.8092636982485427 0.8092636982485416
201: 0.8092636982485427 0.8092636982485416
Concurrent tile optimization loop took 110 ms, total took 110 ms
Successfully optimized configuration of 2 tiles after 202 iterations:
  average displacement: 0.809px
  minimal displacement: 0.809px
  maximal displacement: 0.809px

Full logs and interestpoint.n5 are available here.

So, the solver succeeds if I use such a large blocksize in the detection step so that all IPs are detected in one spark task. My hypothesis is that in merging of IPs from different blocks pathological values enter the catalog, perhaps NaN?

StephanPreibisch commented 4 months ago

Thanks so much @kgabor! The issue was that we had duplicate IDs and IDs in a random order if they are detected in blocks inside an image in parallel.

https://github.com/JaneliaSciComp/BigStitcher-Spark/commit/8cdd7bb03d1f9a62a33770e3fcdc8abb3feda47f