Gradient sometimes zero (which prevents a gradient decent optimizer from decenting)

romangrothausmann commented 5 years ago

I have experienced many cases where I can extract a path with the IterateNeighborhoodOptimizer but where the GradientDescentOptimizer does not start to decent even though the speed function is continuous (based on a binary segmentation), apparently because the initial local gradient is calculated to be zero (even though start- and end-point, speed function etc are kept the same). I have a vague feeling that this thresholding: https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/6ae35b9d044ff2be22ce42776c64258be9252313/include/itkSingleImageCostFunction.hxx#L145-L152 to a hard coded value of 15.0: https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/6ae35b9d044ff2be22ce42776c64258be9252313/include/itkSingleImageCostFunction.hxx#L140 could be the culprit.

Another possible reason I could imagine is that the floating point precision might not suffice under some circumstances to calculate a non-zero gradient even if there is no extrema in the cost function at the current optimizer position.

I use this CLI for the testing: https://github.com/romangrothausmann/ITK-CLIs/blob/bfb1312142d505cacd6770e4d5acc23475290c8f/min-path_seg.cxx

romangrothausmann commented 5 years ago

Where could I set the precision used for the gradient calculation or is it hard-coded somewhere to float or double?

romangrothausmann commented 5 years ago

While testing that #62 works as expected, even setting DerivativeThreshold of 1e9 did not solve the issue reported here, that under some (unknown) circumstances no gradient decent optimizer starts to run, apparently because the computed gradient is zero even though the IterateNeighborhoodOptimizer does find lower values even with the same step-size. @thewtex Any idea what could cause this? Can you re-open the issue (I can't)?

romangrothausmann commented 5 years ago

It seems float is used by default for the gradient calculation https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkPhysicalCentralDifferenceImageFunction.h#L41-L43 and to my understanding there is no direct way to override this default when instantiating SpeedFunctionToPathFilter nor by separate instantiating its SingleImageCostFunction, because the PhysicalCentralDifferenceImageFunction https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkSingleImageCostFunction.h#L103 cannot be re-set. However, the final default that gets used seems to come from the superclass CostFunctionTemplate< TInternalComputationValueType >::ParametersValueType https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkSingleImageCostFunction.h#L91 Is there any way to set this to double when using SpeedFunctionToPathFilter in a program (e.g. in min-path_seg) without modifying the code of this module by adding another template parameter for this? Could the limited precision be the source for this issue?

thewtex commented 5 years ago

@romangrothausmann we could also change the default to double if you find that it corrects behavior.

romangrothausmann commented 5 years ago

Many thanks @thewtex for your reply. How can I specify to use double in this case? I.e. how can I specify the template parameter TInternalComputationValueType of the superclass when it is inherited https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkSingleImageCostFunction.h#L52-L53 I cannot find where the default is set for this. To my understanding that must happen somewhere, but there seems no default specification for template< typename TInternalComputationValueType >. Or what would be other hacky ways that would suffice for testing?

thewtex commented 5 years ago

It looks like the default is already double:

https://github.com/InsightSoftwareConsortium/ITK/blob/99aaeaf1a553866f1e36d4ec363d2a37e1377165/Modules/Numerics/Optimizers/include/itkCostFunction.h#L67

romangrothausmann commented 5 years ago

Many thanks @thewtex for pointing that out. I was not aware of such a way to define a default and was only used to something like https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkPhysicalCentralDifferenceImageFunction.h#L43

@thewtex could you re-open the issue, because it does not seem to be resolved and since the default already is double I've currently no idea what else could be the source for the problem, do you?

thewtex commented 5 years ago

I am not sure -- unfortunately, I do not have time to dig into it at the moment.

romangrothausmann commented 5 years ago

Edited the issue description, now mentioning that the speed image is based on a binary segmentation image (dilated by a parabolic SE in a simple case), so noise should not be the issue.

thewtex commented 5 years ago

How about using an inverse distance map generated from the segmentation for the speed image?

romangrothausmann commented 5 years ago

Thanks @thewtex for that idea, which would be much quicker to generate than my current approach using a fast-marching map of the 3D skeleton (inside the segmentation) squeezed into the range [0; 1] by a sigmoid. However, this has the advantage that the speed along the center line is always close to 1 independent of the local extent of vessels (which vary by a factor of 1000 in our dataset).

Still, I think that another speed map would only represent a work-around for the issue here, because I would expect the local gradient not to equal zero if IterateNeighborhoodOptimizer does start to propagate at the start-point, even reaching the end-point.

romangrothausmann commented 5 years ago

Just to mention that I use this CLI for the testing: https://github.com/romangrothausmann/ITK-CLIs/blob/bfb1312142d505cacd6770e4d5acc23475290c8f/min-path_seg.cxx

romangrothausmann commented 5 years ago

Since the testing CLI uses linear interpolation, #63 seems not to be the source for this issue.

romangrothausmann commented 5 years ago

The gradient calculation uses the image spacing https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkPhysicalCentralDifferenceImageFunction.hxx#L64 whereas the iterateNeighborhoodOptimizer uses its neighborhood size for the stepping https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/src/itkIterateNeighborhoodOptimizer.cxx#L191-L193 Could this be the source for the issue?

richardbeare commented 5 years ago

I have tracked down what I believe to be the problem, and will attempt to generate tests to show it.

The root of the issue is here

The fast marching tools generate arrival time images that the gradient descent algorithm tracks through. When setting up the arrival image, all voxels are set to the maximum floating point value. If any of these voxels remain in the neighbourhood of the end point, the gradient calculation is messed up.

The default values are OK for images with unit spacing and unit speed functions, but break down for fractional spacing etc. It turns out that the units for TargetOffset are voxels, and thus is shouldn't depend on the tolerance setting.

TargetOffset should be 2. No need to depend on tolerance.

romangrothausmann commented 5 years ago

Many thanks @richardbeare for looking into this. I had stumbled over https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkSpeedFunctionToPathFilter.hxx#L94 but wasn't sure whether FastMarching would expect voxel or pyhsical offset (and forgot to investigate furhter). Please go ahead an open a PR that removes the multiplication and I will try to test if that solves the issue.

richardbeare commented 5 years ago

I have been trying to reproduce the behaviour in the existing tests and have lots of other problems. Going to take some work to sort it out.

I now think the TargetOffset should probably be:

marching->SetTargetOffset( 2.0 * maximumvoxelspacing);

But will experiment further to sort it out.

The current tests do break when I modify the test images to have much smaller voxels, but part of the problem is the initial step size is way too large - the first point on the rendered path is nowhere near the correct starting point. It isn't giving the error that the gradient is initially zero. Perhaps there is something specific to the 3D case that caused that error.

richardbeare commented 5 years ago

Thinking further - it seems to me that there is no approach using the TargetOffset that will guarantee that all pixels in the immediate neighbourhood have the arrival function evaluated. A guess can be made if the value of the speed function in the neighbourhood is known, but this sounds error prone. I think the most reliable option will be to fill the neighbourhood with dummy target points, which get used to terminate the fast marching step but aren't used by the path phase.

richardbeare commented 5 years ago

Yet more thinking and playing around with tests of fractional voxels, and the existing test images don't trigger the problem we saw with Roman's data. However things are clearer to me now. If the backtracked path passes a voxel that isn't visited by the fast marching filter then there will be problems in the gradient calculation. This could happen if the face connected neighbourhood of the endpoint includes such points, or if the path transits the edge of a mask. In general the result will be an incorrect gradient of some sort.

The best option may be to modify the PhysicalCentralDifferenceImageFunction to ignore neighbours with the fast marching marker value, and use a one sided difference instead in those cases.

romangrothausmann commented 5 years ago

Many thanks @richardbeare for the thorough investigations.

existing test images don't trigger the problem we saw with Roman's data.

I could imagine that it could be the small voxel size of my data (0.00015) or the fact that it is quite a large dataset (3499 x 3499 x 2796 to be precise). Passing a "voxel that isn't visited by the fast marching filter" is surely critical, and I have experienced that before, but then the gradient decent did start and just "got stuck" before reaching the end. However in this case the gradient decent does not even start, and I've checked that its close proximity is definitly not masked.

The best option may be to modify the PhysicalCentralDifferenceImageFunction to ignore neighbours with the fast marching marker value, and use a one sided difference instead in those cases.

That's a great idea, currently it used the left and the right value for the partial derivative https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkPhysicalCentralDifferenceImageFunction.hxx#L73-L74 This could lead to a very large gradient (even in the wrong direction) in case one side is not reached by the fast marching filter which then gets thresholded (even with a DerivativeThreshold of 1e9): https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkSingleImageCostFunction.hxx#L149-L152

Another way would be to extent the gradient decent code with that from the neighbourhood optimizer in case the gradient is found to be zero. This however would (I think) the introduction of a new optimizer, whereas @richardbeare idea would just touch the PhysicalCentralDifferenceImageFunction and nothing more.

@richardbeare how to decide which side to take in such cases? Would it be possible to mix them in case the border of the cost function is reached for one partial derivative to the right and for the other to the left? I guess in the end it is better to have an "odd" gradient than none.

richardbeare commented 5 years ago

Ah, I hadn't noticed that threshold. It certainly explains why the starting derivative can be zero. I was assuming that a large starting gradient caused a jump into an illegal part of the image.

The other way in which the effect may happen, without the target point touching the mask, is due to termination of the fast marching stages. I am attempting to create this effect in some tests.

romangrothausmann commented 5 years ago

Yes, I think, considering your findings, that the initial gradient computation leads to a very large gradient (possibly even in false direction) "due to termination of the fast marching" just at the start point, which then gets zeroed by that thresholding. A one-sided partial derivative is likely able to solve this issue. Currently, I cannot say why this problem is not always hit.

richardbeare commented 5 years ago

Slightly more progress - I've managed to reproduce the situation where the initial gradient is zero. It happens when the values in the speed image are very low, leading to neighbours remaining unevaluated by fast marching. This leads to a very large gradient that gets set to zero by the magnitude check.

Checking for illegal neighbours before computing the gradient is definitely the way to address these problems. The question is how to fit it into the current framework. We could hard-code the value used to check for illegality, much along the lines of the current gradient threshold. It doesn't seem possible to plug different gradient calculators into the SingleImageCostFunction, thus we can't make a "safe" version of PhysicalCentralDifferenceImageFunction to plug in at user discretion.

The trickiest part of the problem is deciding on the design. Once the value is known the rest is simple, but messy looking.

romangrothausmann commented 5 years ago

I've managed to reproduce the situation where the initial gradient is zero. It happens when the values in the speed image are very low, leading to neighbours remaining unevaluated by fast marching.

That's really great! Many thanks @richardbeare for finding that problem, I would not have expected it there. Odd though, that it can happen that some voxels remain unevaluated by fast marching. @thewtex is that a known issue of the current implementation of the fast marching filter? If so, should that be addressed or is it inevitable?

The question is how to fit it into the current framework. We could hard-code the value used to check for illegality, much along the lines of the current gradient threshold.

Well, I'd go for a hard-coded value only if it is definitely fixed, otherwise a hard-coded default that could possibly changed at user-level (like #62).

It doesn't seem possible to plug different gradient calculators into the SingleImageCostFunction, thus we can't make a "safe" version of PhysicalCentralDifferenceImageFunction to plug in at user discretion.

Hm, wouldn't it be possible with a change of the implementation of SingleImageCostFunction, making providing another PhysicalCentralDifferenceImageFunction an optional template parameter?

This leads to a very large gradient that gets set to zero by the magnitude check.

What I do wonder though: How comes that this applies to all partial derivative gradients? Or the other way round, why is not even a single partial derivative gradient OK (if the NN-opt runs)? Sure, the direction would not be appropriate, but at least that should at least make the gradient decent optimizer to advance one step.

richardbeare commented 5 years ago

OK, Less simple than I thought. The PhysicalCentralDifferenceImageFunction uses the interpolator to determine image values one voxel away along each axis. My original thought was to check whether those neighbours were legal, and not use the interpolation result if they aren't. However the interpolator is using the entire neighbourhood of the neighbouring voxel, and thus will return a spurious value if one of those neighbours is illegal. e.g.

suppose we're checking the voxel to the "left" on the x dimension. First we check whether the voxel at the left index is legal. If it is, then we ask for the interpolation result. However the interpolator uses all neighbours of that voxel (unless we're lined up with the voxel centres).

Not sure how to address this. Ideal case would be to pass a mask to the interpolator to indicate which values are legal.

Perhaps a special interpolator is the way to go.

richardbeare commented 5 years ago

I've implemented a safe interpolator and some tests. The code is in the SafeInterpolator branch of my repo.

As pointed out in #63, interpolators are used in two places, and the problem with large neighbouring values can happen in either. Currently I have modified PhysicalCentralDifference to use the new interpolator, but it would be better if we allowed it to be set via the cost function.

I expect that this change will allow better behaviour for speed functions through funny shaped masks, but I'd image that really rough speed functions are likely to continue to cause problems. Not sure there's any way around that.

There is a definite need for the user to set learning rates and termination values and derivative thresholds based on voxel size and speed function values, and we probably should provide more documentation about that as it is extremely difficult to figure out why errors occur.

May be that the regular step gradient descent is much more robust.

richardbeare commented 5 years ago

The reason the fast marching leaves pixels unevaluated (besides the zero speed ones), is that it stops after reaching all the targets. There is a parameter that causes it to proceed slightly beyond the target, and this is set by default, but sometimes the speed can be low enough for the front to not reach the neighbouring pixel.

romangrothausmann commented 5 years ago

I've implemented a safe interpolator and some tests. The code is in the SafeInterpolator branch of my repo.

Many thanks @richardbeare, I wanted to try and had a look but I can't find an implementation for itkLinearInterpolateSelectedNeighborsImageFunction, which is used here: https://github.com/richardbeare/ITKMinimalPathExtraction/commit/5e6dcb30da2066bddad89d4e3e3f6509d7c39387#diff-709a2cd1ccbb3cef49595927b0bdaa9dR112

suppose we're checking the voxel to the "left" on the x dimension. First we check whether the voxel at the left index is legal. If it is, then we ask for the interpolation result. However the interpolator uses all neighbours of that voxel (unless we're lined up with the voxel centres).

Hm, how about not interpolating but using the left voxel value (i.e. voxel center)? The code already uses the image spacing, so I'd say it already aims to take the value of the voxel center: https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkPhysicalCentralDifferenceImageFunction.hxx#L64 That way we could avoid the need for using an interpolator and would indirectly resovle #63.

I'd image that really rough speed functions are likely to continue to cause problems.

Perhaps some form of smoothing the speed function would help there. At least that would be something for a user to inspect and take care of, i.e. would not need a modification of the current code. As far as I can see, for me it would be fine if the gradient decent starts running but stops before reaching the end. In that case I would know I would have to investigate the local situation including checking the speed function at that point. In the case here, the problem is that it is not the speed function but the cost function that is normally not directly accessible under common usage.

The reason the fast marching leaves pixels unevaluated (besides the zero speed ones), is that it stops after reaching all the targets.

I see. The way I understood your comment above was, that I thought you meant even on the way of the fast marching front some pixel would not be evaluated, even far off from the target or zero speed pixel. No evaluation for zero speed and not beyond the offset of reaching the target is clear.

richardbeare commented 5 years ago

@romangrothausmann I've added yet another branch with a version allowing the interpolator to be changed by the user. The default is to keep the linear interpolator, but now the user can modify the cost function interpolator, and the cost function will provide that interpolator to the gradient function on initialization. There may be some problems if the interpolator gets changed after initialization, but I don't think that is the usual use case. I don't think these changes break any existing code, but see how you go.

romangrothausmann commented 5 years ago

Tested with ITK @ https://github.com/romangrothausmann/ITK/commit/47768d88e02038cd89c9dc89208a5299b0cb5d65 using https://github.com/richardbeare/ITKMinimalPathExtraction/commit/2f54421df763e8121c5a3f828e630c85769c81c0 for ITKMinimalPathExtraction. Setting the special interpolator works (https://github.com/romangrothausmann/ITK-CLIs/commit/b6b9667392016c2f02797a1b65866b71201807aa). With that the gradient decent optimizer did start for our test case and got as close to the next way point as demanded by the termination value set. However, for the following run to the next target point the gradient decent optimizer again did not start to run due to a zero gradient. I could imagine that the last point from the grad. dec. opt. from the former run was still to far away from the way-point to fall into the valid region of the newly calculated cost function.

romangrothausmann commented 5 years ago

Hm, looking closer, it seems that the last optimizer front point is used (as it should be) and not the way-point for stopping the fast marching https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/57bb5b701c3766fc0cbe1ce2c5c2a23dd75b5957/include/itkSpeedFunctionToPathFilter.hxx#L97-L117 but I wonder why the next front is also used for setting the target points: https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/57bb5b701c3766fc0cbe1ce2c5c2a23dd75b5957/include/itkSpeedFunctionToPathFilter.hxx#L119-L127 Doesn't that mean that the fast marching stops already if it reached the second next way-point (or end-point) possibly before reaching the next way-point? Or does the fast marching stop only when all target points have been reached? Wouldn't using the next front in that case be unnecessary?

romangrothausmann commented 5 years ago

It seems that in the branch ConsolidateInterpolator the TerminationValue is still used for the offset: https://github.com/richardbeare/ITKMinimalPathExtraction/blob/2f54421df763e8121c5a3f828e630c85769c81c0/include/itkSpeedFunctionToPathFilter.hxx#L94 Could that be the problem?

richardbeare commented 5 years ago

I eventually decided that fiddling with the offset wasn't the way to go, as there were several ways that illegal values could influence the interpolator, and fixing the interpolator should fix all of them.

I'd assume that fast marching needs to be run separately from each way point in series, using the next destination as the stopping condition.

Which speed function are you using at present?

I've had trouble getting regular access to a host with enough ram to reproduce the exact problem, but it looks like I'll need to try again.

romangrothausmann commented 5 years ago

@richardbeare would you create a PR here that offers the changes you made to avoid the gradient to be zero? I think it would be a very valuable contribution also for others using ITKMinimalPathExtraction.

richardbeare commented 5 years ago

Certainly planning to. I suspect that more documentation may be necessary too. Slightly daunted by the number of bug fixes that I've included that change results.

romangrothausmann commented 5 years ago

Continuing to post on this issue as long as we do not have a PR where these comments could go to: Using https://github.com/romangrothausmann/SITK-CLIs/pull/3 to create an other speed function and using https://github.com/romangrothausmann/ITK-CLIs/pull/6 (which incorporates https://github.com/romangrothausmann/ITK-CLIs/pull/4) it was possible to extract a path from the start-point to the end point with regular step gradient decent (RSGD) using a step size smaller than the voxel spacing on a sub-sampled dataset.

So far it is not possible to get a path if a way-point is given as well, nor on the full dataset (which needs https://github.com/romangrothausmann/ITK-CLIs/pull/5 but hitting https://github.com/InsightSoftwareConsortium/ITK/issues/715 with that, see https://github.com/romangrothausmann/ITK-CLIs/pull/5#issuecomment-499818178)

romangrothausmann commented 5 years ago

Update: https://github.com/richardbeare/ITKMinimalPathExtraction/tree/ConsolidateInterpolator solves this and the other issues.

richardbeare commented 5 years ago

Apologies - very busy at present...

thewtex commented 5 years ago

So far it is not possible to get a path if a way-point is given as well, nor on the full dataset

@romangrothausmann @richardbeare has this been resolved in the ConsolidateInterpolator branch?

thewtex commented 5 years ago

@richardbeare ConsolidateInterpolator looks like excellent progress! Do you mind if I integrate, and we iterate as-needed?

richardbeare commented 5 years ago

Sounds like a good start. Thanks.

romangrothausmann commented 5 years ago

So far it is not possible to get a path if a way-point is given as well, nor on the full dataset

@romangrothausmann @richardbeare has this been resolved in the ConsolidateInterpolator branch?

Yes, worked perfectly when I tested with https://github.com/richardbeare/ITKMinimalPathExtraction/commit/f89bc840d7fee36a6e303652e00d338fa35023b7.

thewtex commented 5 years ago

Rebased here: https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/pull/71

richardbeare commented 5 years ago

Here's a summary of what I remember about the various problems we attempted to address. Thought I would write it down somewhere as reference. These issues are related to approaches that use some form of gradient descent, which is required if sub-voxel precision is required. The minimum neighbour iterator avoids these problems, but does not provide sub-voxel precision.

Some of the problems stem from the use of geodesics derived from masks. Masks have areas of zero speed. The resultant path is meant to be excluded from these regions. The arrival function is infinite in the areas of zero speed. The presence of infinite values in the arrival function leads to problems computing gradients.

The second way that infinite values may be present in the arrival function is if computation is due to early termination of the arrival function calculation. Usually we want early termination for performance reasons. However if termination is too early then neighbours of target points may have infinite arrival time, leading to problems. There are some heuristics in the original attempting to deal with this, but they are wrong (incorrectly coupled to the stopping condition) and don't work if the speed function is low in the target area.

I suspect there is no good way to reliably solve the problem without getting really complex. Sometimes the shortest path should go on the edge of a mask, next to infinite values. How can we do a gradient calculation that is certain not to project into the mask? The tracking will fail if the optimizer ends up in the mask (termination by exceeding the step count).

Bug - only a single location in way points were being included in the list of fast marching targets. Wasn't a problem for this project, but meant that extended waypoints were likely to be processed incorrectly
Failure to start gradient descent optimizer was caused by a number of different factors. The fixes make the problems less likely, but don't eliminate them, for reasons outlined above. The reason some of these problems may not have been uncovered previously is that most testing was done with images of unit size voxels and speed functions with relatively low dynamic range. It becomes difficult to set up the problem reliably if either of these assumptions is broken. Fortunately we've managed to work around these issues at the cost of greater code complexity.
1. Interpolator that ignores voxels based on user controlled conditions (i.e. discard a voxel from gradient calculation if it is higher than some threshold.
2. Explicitly include neighbours of each target point in the list of required destinations for arrival time calculation.
3. Interrogate the arrival times in the neighbourhood of the starting point so that termination conditions can be set to some fraction of the minimum difference between them.

The last two points avoid the need for users to understand the relationship between voxel spacing and speed function value, which can be very tricky. Especially useful to have reliable defaults for these when developing and testing speed functions.

thewtex commented 5 years ago

@richardbeare thank you for the summary and thoughts!

Thinking about a way we can approach the issues that is robust but avoids complexity: what do think about constraining / conditioning the speed function and arrival function to avoid the infinite values that are so troublesome. That is, we suggest that the speed function varies from some small value to 1 instead of 0 to 1. And, we replace the infinite arrival function times with a multiple of the max arrival time in the arrival function. ?

richardbeare commented 5 years ago

The complexity isn't huge, and most it that isn't part of the new interpolator relates to either stopping conditions or ensuring that sufficient arrival function is computed - both of those are independent of infinite values that the user may require. We definitely want to allow infinite values, as it is the only way to implement a certain exclusion zone.

I think the complexity is warranted. Every other approach I can think of is a heuristic that will break at some point. I suspect the only way to go is to be thorough with documentation. e.g. if you have a maze-type problem, with complex exclusion zones and narrow openings, then you are unlikely to be reliably able to use a gradient descent approach in combination with a flat speed function. You may have success with a speed function that is biased towards centre lines, but success isn't certain. Problems will still occur if the paths are narrow.

I suspect that what we are heading towards is one of the truly tested implementations of this sort of thing that is around. Apparently even some of the originals had arbitrary exclusion criteria that set gradients to zero (I think there is a reference in the comments)

InsightSoftwareConsortium / ITKMinimalPathExtraction

Gradient sometimes zero (which prevents a gradient decent optimizer from decenting) #61