Open romangrothausmann opened 5 years ago
Where could I set the precision used for the gradient calculation or is it hard-coded somewhere to float or double?
While testing that #62 works as expected, even setting DerivativeThreshold of 1e9 did not solve the issue reported here, that under some (unknown) circumstances no gradient decent optimizer starts to run, apparently because the computed gradient is zero even though the IterateNeighborhoodOptimizer does find lower values even with the same step-size. @thewtex Any idea what could cause this? Can you re-open the issue (I can't)?
It seems float
is used by default for the gradient calculation
https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkPhysicalCentralDifferenceImageFunction.h#L41-L43
and to my understanding there is no direct way to override this default when instantiating SpeedFunctionToPathFilter nor by separate instantiating its SingleImageCostFunction, because the PhysicalCentralDifferenceImageFunction
https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkSingleImageCostFunction.h#L103
cannot be re-set.
However, the final default that gets used seems to come from the superclass
CostFunctionTemplate< TInternalComputationValueType >::ParametersValueType
https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkSingleImageCostFunction.h#L91
Is there any way to set this to double
when using SpeedFunctionToPathFilter
in a program (e.g. in min-path_seg) without modifying the code of this module by adding another template parameter for this?
Could the limited precision be the source for this issue?
@romangrothausmann we could also change the default to double
if you find that it corrects behavior.
Many thanks @thewtex for your reply. How can I specify to use double in this case? I.e. how can I specify the template parameter TInternalComputationValueType
of the superclass
when it is inherited
https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkSingleImageCostFunction.h#L52-L53
I cannot find where the default is set for this. To my understanding that must happen somewhere, but
there seems no default specification for template< typename TInternalComputationValueType >
.
Or what would be other hacky ways that would suffice for testing?
It looks like the default is already double
:
Many thanks @thewtex for pointing that out. I was not aware of such a way to define a default and was only used to something like https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkPhysicalCentralDifferenceImageFunction.h#L43
@thewtex could you re-open the issue, because it does not seem to be resolved and since the default already is double
I've currently no idea what else could be the source for the problem, do you?
I am not sure -- unfortunately, I do not have time to dig into it at the moment.
Edited the issue description, now mentioning that the speed image is based on a binary segmentation image (dilated by a parabolic SE in a simple case), so noise should not be the issue.
How about using an inverse distance map generated from the segmentation for the speed image?
Thanks @thewtex for that idea, which would be much quicker to generate than my current approach using a fast-marching map of the 3D skeleton (inside the segmentation) squeezed into the range [0; 1] by a sigmoid. However, this has the advantage that the speed along the center line is always close to 1 independent of the local extent of vessels (which vary by a factor of 1000 in our dataset).
Still, I think that another speed map would only represent a work-around for the issue here, because I would expect the local gradient not to equal zero if IterateNeighborhoodOptimizer
does start to propagate at the start-point, even reaching the end-point.
Just to mention that I use this CLI for the testing: https://github.com/romangrothausmann/ITK-CLIs/blob/bfb1312142d505cacd6770e4d5acc23475290c8f/min-path_seg.cxx
Since the testing CLI uses linear interpolation, #63 seems not to be the source for this issue.
The gradient calculation uses the image spacing https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkPhysicalCentralDifferenceImageFunction.hxx#L64 whereas the iterateNeighborhoodOptimizer uses its neighborhood size for the stepping https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/src/itkIterateNeighborhoodOptimizer.cxx#L191-L193 Could this be the source for the issue?
I have tracked down what I believe to be the problem, and will attempt to generate tests to show it.
The root of the issue is here
The fast marching tools generate arrival time images that the gradient descent algorithm tracks through. When setting up the arrival image, all voxels are set to the maximum floating point value. If any of these voxels remain in the neighbourhood of the end point, the gradient calculation is messed up.
The default values are OK for images with unit spacing and unit speed functions, but break down for fractional spacing etc. It turns out that the units for TargetOffset are voxels, and thus is shouldn't depend on the tolerance setting.
TargetOffset should be 2. No need to depend on tolerance.
Many thanks @richardbeare for looking into this. I had stumbled over https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkSpeedFunctionToPathFilter.hxx#L94 but wasn't sure whether FastMarching would expect voxel or pyhsical offset (and forgot to investigate furhter). Please go ahead an open a PR that removes the multiplication and I will try to test if that solves the issue.
I have been trying to reproduce the behaviour in the existing tests and have lots of other problems. Going to take some work to sort it out.
I now think the TargetOffset should probably be:
marching->SetTargetOffset( 2.0 * maximumvoxelspacing);
But will experiment further to sort it out.
The current tests do break when I modify the test images to have much smaller voxels, but part of the problem is the initial step size is way too large - the first point on the rendered path is nowhere near the correct starting point. It isn't giving the error that the gradient is initially zero. Perhaps there is something specific to the 3D case that caused that error.
Thinking further - it seems to me that there is no approach using the TargetOffset that will guarantee that all pixels in the immediate neighbourhood have the arrival function evaluated. A guess can be made if the value of the speed function in the neighbourhood is known, but this sounds error prone. I think the most reliable option will be to fill the neighbourhood with dummy target points, which get used to terminate the fast marching step but aren't used by the path phase.
Yet more thinking and playing around with tests of fractional voxels, and the existing test images don't trigger the problem we saw with Roman's data. However things are clearer to me now. If the backtracked path passes a voxel that isn't visited by the fast marching filter then there will be problems in the gradient calculation. This could happen if the face connected neighbourhood of the endpoint includes such points, or if the path transits the edge of a mask. In general the result will be an incorrect gradient of some sort.
The best option may be to modify the PhysicalCentralDifferenceImageFunction to ignore neighbours with the fast marching marker value, and use a one sided difference instead in those cases.
Many thanks @richardbeare for the thorough investigations.
existing test images don't trigger the problem we saw with Roman's data.
I could imagine that it could be the small voxel size of my data (0.00015) or the fact that it is quite a large dataset (3499 x 3499 x 2796 to be precise). Passing a "voxel that isn't visited by the fast marching filter" is surely critical, and I have experienced that before, but then the gradient decent did start and just "got stuck" before reaching the end. However in this case the gradient decent does not even start, and I've checked that its close proximity is definitly not masked.
The best option may be to modify the PhysicalCentralDifferenceImageFunction to ignore neighbours with the fast marching marker value, and use a one sided difference instead in those cases.
That's a great idea, currently it used the left and the right value for the partial derivative https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkPhysicalCentralDifferenceImageFunction.hxx#L73-L74 This could lead to a very large gradient (even in the wrong direction) in case one side is not reached by the fast marching filter which then gets thresholded (even with a DerivativeThreshold of 1e9): https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkSingleImageCostFunction.hxx#L149-L152
Another way would be to extent the gradient decent code with that from the neighbourhood optimizer in case the gradient is found to be zero. This however would (I think) the introduction of a new optimizer, whereas @richardbeare idea would just touch the PhysicalCentralDifferenceImageFunction and nothing more.
@richardbeare how to decide which side to take in such cases? Would it be possible to mix them in case the border of the cost function is reached for one partial derivative to the right and for the other to the left? I guess in the end it is better to have an "odd" gradient than none.
Ah, I hadn't noticed that threshold. It certainly explains why the starting derivative can be zero. I was assuming that a large starting gradient caused a jump into an illegal part of the image.
The other way in which the effect may happen, without the target point touching the mask, is due to termination of the fast marching stages. I am attempting to create this effect in some tests.
Yes, I think, considering your findings, that the initial gradient computation leads to a very large gradient (possibly even in false direction) "due to termination of the fast marching" just at the start point, which then gets zeroed by that thresholding. A one-sided partial derivative is likely able to solve this issue. Currently, I cannot say why this problem is not always hit.
Slightly more progress - I've managed to reproduce the situation where the initial gradient is zero. It happens when the values in the speed image are very low, leading to neighbours remaining unevaluated by fast marching. This leads to a very large gradient that gets set to zero by the magnitude check.
Checking for illegal neighbours before computing the gradient is definitely the way to address these problems. The question is how to fit it into the current framework. We could hard-code the value used to check for illegality, much along the lines of the current gradient threshold. It doesn't seem possible to plug different gradient calculators into the SingleImageCostFunction, thus we can't make a "safe" version of PhysicalCentralDifferenceImageFunction to plug in at user discretion.
The trickiest part of the problem is deciding on the design. Once the value is known the rest is simple, but messy looking.
I've managed to reproduce the situation where the initial gradient is zero. It happens when the values in the speed image are very low, leading to neighbours remaining unevaluated by fast marching.
That's really great! Many thanks @richardbeare for finding that problem, I would not have expected it there. Odd though, that it can happen that some voxels remain unevaluated by fast marching. @thewtex is that a known issue of the current implementation of the fast marching filter? If so, should that be addressed or is it inevitable?
The question is how to fit it into the current framework. We could hard-code the value used to check for illegality, much along the lines of the current gradient threshold.
Well, I'd go for a hard-coded value only if it is definitely fixed, otherwise a hard-coded default that could possibly changed at user-level (like #62).
It doesn't seem possible to plug different gradient calculators into the SingleImageCostFunction, thus we can't make a "safe" version of PhysicalCentralDifferenceImageFunction to plug in at user discretion.
Hm, wouldn't it be possible with a change of the implementation of SingleImageCostFunction, making providing another PhysicalCentralDifferenceImageFunction an optional template parameter?
This leads to a very large gradient that gets set to zero by the magnitude check.
What I do wonder though: How comes that this applies to all partial derivative gradients? Or the other way round, why is not even a single partial derivative gradient OK (if the NN-opt runs)? Sure, the direction would not be appropriate, but at least that should at least make the gradient decent optimizer to advance one step.
OK, Less simple than I thought. The PhysicalCentralDifferenceImageFunction uses the interpolator to determine image values one voxel away along each axis. My original thought was to check whether those neighbours were legal, and not use the interpolation result if they aren't. However the interpolator is using the entire neighbourhood of the neighbouring voxel, and thus will return a spurious value if one of those neighbours is illegal. e.g.
suppose we're checking the voxel to the "left" on the x dimension. First we check whether the voxel at the left index is legal. If it is, then we ask for the interpolation result. However the interpolator uses all neighbours of that voxel (unless we're lined up with the voxel centres).
Not sure how to address this. Ideal case would be to pass a mask to the interpolator to indicate which values are legal.
Perhaps a special interpolator is the way to go.
I've implemented a safe interpolator and some tests. The code is in the SafeInterpolator branch of my repo.
As pointed out in #63, interpolators are used in two places, and the problem with large neighbouring values can happen in either. Currently I have modified PhysicalCentralDifference to use the new interpolator, but it would be better if we allowed it to be set via the cost function.
I expect that this change will allow better behaviour for speed functions through funny shaped masks, but I'd image that really rough speed functions are likely to continue to cause problems. Not sure there's any way around that.
There is a definite need for the user to set learning rates and termination values and derivative thresholds based on voxel size and speed function values, and we probably should provide more documentation about that as it is extremely difficult to figure out why errors occur.
May be that the regular step gradient descent is much more robust.
The reason the fast marching leaves pixels unevaluated (besides the zero speed ones), is that it stops after reaching all the targets. There is a parameter that causes it to proceed slightly beyond the target, and this is set by default, but sometimes the speed can be low enough for the front to not reach the neighbouring pixel.
I've implemented a safe interpolator and some tests. The code is in the SafeInterpolator branch of my repo.
Many thanks @richardbeare, I wanted to try and had a look but I can't find an implementation for itkLinearInterpolateSelectedNeighborsImageFunction
, which is used here:
https://github.com/richardbeare/ITKMinimalPathExtraction/commit/5e6dcb30da2066bddad89d4e3e3f6509d7c39387#diff-709a2cd1ccbb3cef49595927b0bdaa9dR112
suppose we're checking the voxel to the "left" on the x dimension. First we check whether the voxel at the left index is legal. If it is, then we ask for the interpolation result. However the interpolator uses all neighbours of that voxel (unless we're lined up with the voxel centres).
Hm, how about not interpolating but using the left voxel value (i.e. voxel center)? The code already uses the image spacing, so I'd say it already aims to take the value of the voxel center: https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/2ff1ab58f30842ac57d76c8bc9462fcbc7523e67/include/itkPhysicalCentralDifferenceImageFunction.hxx#L64 That way we could avoid the need for using an interpolator and would indirectly resovle #63.
I'd image that really rough speed functions are likely to continue to cause problems.
Perhaps some form of smoothing the speed function would help there. At least that would be something for a user to inspect and take care of, i.e. would not need a modification of the current code. As far as I can see, for me it would be fine if the gradient decent starts running but stops before reaching the end. In that case I would know I would have to investigate the local situation including checking the speed function at that point. In the case here, the problem is that it is not the speed function but the cost function that is normally not directly accessible under common usage.
The reason the fast marching leaves pixels unevaluated (besides the zero speed ones), is that it stops after reaching all the targets.
I see. The way I understood your comment above was, that I thought you meant even on the way of the fast marching front some pixel would not be evaluated, even far off from the target or zero speed pixel. No evaluation for zero speed and not beyond the offset of reaching the target is clear.
@romangrothausmann I've added yet another branch with a version allowing the interpolator to be changed by the user. The default is to keep the linear interpolator, but now the user can modify the cost function interpolator, and the cost function will provide that interpolator to the gradient function on initialization. There may be some problems if the interpolator gets changed after initialization, but I don't think that is the usual use case. I don't think these changes break any existing code, but see how you go.
Tested with ITK @ https://github.com/romangrothausmann/ITK/commit/47768d88e02038cd89c9dc89208a5299b0cb5d65 using https://github.com/richardbeare/ITKMinimalPathExtraction/commit/2f54421df763e8121c5a3f828e630c85769c81c0 for ITKMinimalPathExtraction
. Setting the special interpolator works (https://github.com/romangrothausmann/ITK-CLIs/commit/b6b9667392016c2f02797a1b65866b71201807aa). With that the gradient decent optimizer did start for our test case and got as close to the next way point as demanded by the termination value set.
However, for the following run to the next target point the gradient decent optimizer again did not start to run due to a zero gradient. I could imagine that the last point from the grad. dec. opt. from the former run was still to far away from the way-point to fall into the valid region of the newly calculated cost function.
Hm, looking closer, it seems that the last optimizer front point is used (as it should be) and not the way-point for stopping the fast marching https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/57bb5b701c3766fc0cbe1ce2c5c2a23dd75b5957/include/itkSpeedFunctionToPathFilter.hxx#L97-L117 but I wonder why the next front is also used for setting the target points: https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/57bb5b701c3766fc0cbe1ce2c5c2a23dd75b5957/include/itkSpeedFunctionToPathFilter.hxx#L119-L127 Doesn't that mean that the fast marching stops already if it reached the second next way-point (or end-point) possibly before reaching the next way-point? Or does the fast marching stop only when all target points have been reached? Wouldn't using the next front in that case be unnecessary?
It seems that in the branch ConsolidateInterpolator
the TerminationValue
is still used for the offset:
https://github.com/richardbeare/ITKMinimalPathExtraction/blob/2f54421df763e8121c5a3f828e630c85769c81c0/include/itkSpeedFunctionToPathFilter.hxx#L94
Could that be the problem?
I eventually decided that fiddling with the offset wasn't the way to go, as there were several ways that illegal values could influence the interpolator, and fixing the interpolator should fix all of them.
I'd assume that fast marching needs to be run separately from each way point in series, using the next destination as the stopping condition.
Which speed function are you using at present?
I've had trouble getting regular access to a host with enough ram to reproduce the exact problem, but it looks like I'll need to try again.
@richardbeare would you create a PR here that offers the changes you made to avoid the gradient to be zero? I think it would be a very valuable contribution also for others using ITKMinimalPathExtraction.
Certainly planning to. I suspect that more documentation may be necessary too. Slightly daunted by the number of bug fixes that I've included that change results.
Continuing to post on this issue as long as we do not have a PR where these comments could go to: Using https://github.com/romangrothausmann/SITK-CLIs/pull/3 to create an other speed function and using https://github.com/romangrothausmann/ITK-CLIs/pull/6 (which incorporates https://github.com/romangrothausmann/ITK-CLIs/pull/4) it was possible to extract a path from the start-point to the end point with regular step gradient decent (RSGD) using a step size smaller than the voxel spacing on a sub-sampled dataset.
So far it is not possible to get a path if a way-point is given as well, nor on the full dataset (which needs https://github.com/romangrothausmann/ITK-CLIs/pull/5 but hitting https://github.com/InsightSoftwareConsortium/ITK/issues/715 with that, see https://github.com/romangrothausmann/ITK-CLIs/pull/5#issuecomment-499818178)
Update: https://github.com/richardbeare/ITKMinimalPathExtraction/tree/ConsolidateInterpolator solves this and the other issues.
Apologies - very busy at present...
So far it is not possible to get a path if a way-point is given as well, nor on the full dataset
@romangrothausmann @richardbeare has this been resolved in the ConsolidateInterpolator
branch?
@richardbeare ConsolidateInterpolator
looks like excellent progress! Do you mind if I integrate, and we iterate as-needed?
Sounds like a good start. Thanks.
So far it is not possible to get a path if a way-point is given as well, nor on the full dataset
@romangrothausmann @richardbeare has this been resolved in the
ConsolidateInterpolator
branch?
Yes, worked perfectly when I tested with https://github.com/richardbeare/ITKMinimalPathExtraction/commit/f89bc840d7fee36a6e303652e00d338fa35023b7.
Here's a summary of what I remember about the various problems we attempted to address. Thought I would write it down somewhere as reference. These issues are related to approaches that use some form of gradient descent, which is required if sub-voxel precision is required. The minimum neighbour iterator avoids these problems, but does not provide sub-voxel precision.
Some of the problems stem from the use of geodesics derived from masks. Masks have areas of zero speed. The resultant path is meant to be excluded from these regions. The arrival function is infinite in the areas of zero speed. The presence of infinite values in the arrival function leads to problems computing gradients.
The second way that infinite values may be present in the arrival function is if computation is due to early termination of the arrival function calculation. Usually we want early termination for performance reasons. However if termination is too early then neighbours of target points may have infinite arrival time, leading to problems. There are some heuristics in the original attempting to deal with this, but they are wrong (incorrectly coupled to the stopping condition) and don't work if the speed function is low in the target area.
I suspect there is no good way to reliably solve the problem without getting really complex. Sometimes the shortest path should go on the edge of a mask, next to infinite values. How can we do a gradient calculation that is certain not to project into the mask? The tracking will fail if the optimizer ends up in the mask (termination by exceeding the step count).
The last two points avoid the need for users to understand the relationship between voxel spacing and speed function value, which can be very tricky. Especially useful to have reliable defaults for these when developing and testing speed functions.
@richardbeare thank you for the summary and thoughts!
Thinking about a way we can approach the issues that is robust but avoids complexity: what do think about constraining / conditioning the speed function and arrival function to avoid the infinite values that are so troublesome. That is, we suggest that the speed function varies from some small value to 1 instead of 0 to 1. And, we replace the infinite arrival function times with a multiple of the max arrival time in the arrival function. ?
The complexity isn't huge, and most it that isn't part of the new interpolator relates to either stopping conditions or ensuring that sufficient arrival function is computed - both of those are independent of infinite values that the user may require. We definitely want to allow infinite values, as it is the only way to implement a certain exclusion zone.
I think the complexity is warranted. Every other approach I can think of is a heuristic that will break at some point. I suspect the only way to go is to be thorough with documentation. e.g. if you have a maze-type problem, with complex exclusion zones and narrow openings, then you are unlikely to be reliably able to use a gradient descent approach in combination with a flat speed function. You may have success with a speed function that is biased towards centre lines, but success isn't certain. Problems will still occur if the paths are narrow.
I suspect that what we are heading towards is one of the truly tested implementations of this sort of thing that is around. Apparently even some of the originals had arbitrary exclusion criteria that set gradients to zero (I think there is a reference in the comments)
I have experienced many cases where I can extract a path with the IterateNeighborhoodOptimizer but where the GradientDescentOptimizer does not start to decent even though the speed function is continuous (based on a binary segmentation), apparently because the initial local gradient is calculated to be zero (even though start- and end-point, speed function etc are kept the same). I have a vague feeling that this thresholding: https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/6ae35b9d044ff2be22ce42776c64258be9252313/include/itkSingleImageCostFunction.hxx#L145-L152 to a hard coded value of
15.0
: https://github.com/InsightSoftwareConsortium/ITKMinimalPathExtraction/blob/6ae35b9d044ff2be22ce42776c64258be9252313/include/itkSingleImageCostFunction.hxx#L140 could be the culprit.Another possible reason I could imagine is that the floating point precision might not suffice under some circumstances to calculate a non-zero gradient even if there is no extrema in the cost function at the current optimizer position.
I use this CLI for the testing: https://github.com/romangrothausmann/ITK-CLIs/blob/bfb1312142d505cacd6770e4d5acc23475290c8f/min-path_seg.cxx