Histogram from vtkPolyDataDistanceHistogramFilter are sometimes different

cpinter commented 7 years ago

https://github.com/SlicerRt/SlicerRT/blob/master/SegmentComparison/Logic/vtkPolyDataDistanceHistogramFilter.cxx#L269

The section that accumulates frequencies for the histogram seems to always return a result that looks reasonable at first glance... but if you re-run it with the same data, you sometimes get a slightly different result. I don't see any immediate explanation for this behaviour.

Migrated from https://app.assembla.com/spaces/slicerrt/tickets/865-histogram-from-vtkpolydatadistancehistogramfilter-are-sometimes-different/details

cpinter commented 7 years ago

2017-07-10 17:27 Thomas Vaughan: I refer to the section that computes the histogram. The problem may or may not involve the image accumulation filter itself. I reworded the description to be more clear.

cpinter commented 6 years ago

After a discussion with @tavaughan this is a reproducibility issue, and the best course of action would be to create an automated test on two different input polydata (possibly vtk files from https://github.com/SlicerRt/SlicerRT/tree/master/Testing/Data, with a simple transform involving rotation applied with vtkTransformPolyDataFilter), set an exact baseline (not like for Dvh when we assert a maximum error), and see if that baseline is consistently met.

tavaughan commented 6 years ago

I've run this test five times. Seems to pass every time.

Still would be good to observe how it behaves over a longer period with more tests...

cpinter commented 6 years ago

Sounds good. Thank you, Thomas! If the test is stable for a month or so, then I think we can close this issue.

cpinter commented 6 years ago

I see this is not merged into the master. We should do it so that the test actualy runs every night.

cpinter commented 6 years ago

Example for comparing two files for a CMake test https://github.com/SlicerRt/SlicerRT/blob/master/DoseAccumulation/Testing/Cxx/CMakeLists.txt#L38-L42

tavaughan commented 6 years ago

I wrote a c++ version of the same test. The output is different.

(Left side == C++, right side == Python) Comparing 330 and 330 Comparing 136 and 136 Comparing 240 and 240 Comparing 180 and 180 Comparing 172 and 172 Comparing 640 and 640 Comparing 392 and 392 Comparing 576 and 576 Comparing 1734 and 58534 Comparing 155907 and 97079 Comparing 1808 and 3660 Comparing 2082 and 2258 Comparing 2840 and 2840 Comparing 2024 and 2024 Comparing 1882 and 1882 Comparing 1736 and 1736 Comparing 971 and 971 Comparing 1214 and 1214 Comparing 702 and 702 Mismatch found!

I'll commit the code shortly. We plan to output the results from both versions as a csv or text file. Then we'll compare the two outputs to make sure they are the same.

tavaughan commented 6 years ago

Where is an appropriate place to store these files? I thought I saw a temporary directory variable somewhere, but I can't find it anymore.

(I'll need to know how to access the directory from both Python and C++)

cpinter commented 6 years ago

Location: https://github.com/SlicerRt/SlicerRT/tree/master/Testing/Data Usage: https://github.com/SlicerRt/SlicerRT/blob/master/DoseComparison/Testing/Cxx/CMakeLists.txt#L31

tavaughan commented 6 years ago

Thanks Csaba, can I also use command line arguments for Python tests? If so how would I do that?

tavaughan commented 6 years ago

Nevermind -- I think I figured it out:

Start slicer: ./Slicer --myarg Foo

Enter into Python Interactor:

slicer.app.commandOptions().unparsedArguments (u'--myarg', u'Foo')

lassoan commented 6 years ago

I usually use --python-code argument, see

https://www.slicer.org/wiki/Documentation/Nightly/ScriptRepository#Launching_Slicer

tavaughan commented 6 years ago

Do you mean --python-code="slicer.polydatadistancesfile=${TEMP}/PolyDataDistancesOutputPython.csv"

Then in the code I would refer to the slicer.polydatadistancesfile variable?

lassoan commented 6 years ago

You can try what is done here: LoadAnnotationRulerScene.py

tavaughan commented 6 years ago

Thanks, I copied that example and it "works" now.

It's actually producing a different output from either previous incarnation of the test... (Left side == New Python test, Right side == Old Python test) Comparing 94 and 94 Comparing 330 and 330 Comparing 136 and 136 Comparing 240 and 240 Comparing 180 and 180 Comparing 172 and 172 Comparing 640 and 640 Comparing 392 and 392 Comparing 576 and 576 Comparing 58480 and 58534 Comparing 97133 and 97079 Comparing 3660 and 3660 Comparing 2258 and 2258 Comparing 2840 and 2840 Comparing 2024 and 2024 Comparing 1882 and 1882 Comparing 1736 and 1736 Comparing 971 and 971 Comparing 1214 and 1214 Comparing 702 and 702

tavaughan commented 6 years ago

I'm having trouble setting dependencies on multiple tests. Right now I have two tests to run the filter: py_PolyDataDistanceHistogramFilterExecutionTestPython vtkPolyDataDistanceHistogramFilterExecutionTestCpp

Once these two tests run, there are two more tests that compare the outputs of the tests: vtkPolyDataDistancesRawOutputComparisonTest (compares each individual distance) vtkPolyDataDistancesHistogramOutputComparisonTest (compares histograms of the distances)

I've tried using set_tests_properties to set test order, but it only seems to work with one test at a time. If I do:

set_tests_properties(vtkPolyDataDistancesRawOutputComparisonTest PROPERTIES DEPENDS vtkPolyDataDistanceHistogramFilterExecutionTestCpp DEPENDS py_PolyDataDistanceHistogramFilterExecutionTestPython)

vtkPolyDataDistanceHistogramFilterExecutionTestCpp won't run until after all other tests have run.

I also tried:

set_tests_properties(vtkPolyDataDistancesRawOutputComparisonTest PROPERTIES DEPENDS vtkPolyDataDistanceHistogramFilterExecutionTestCpp) set_tests_properties(vtkPolyDataDistancesRawOutputComparisonTest PROPERTIES DEPENDS py_PolyDataDistanceHistogramFilterExecutionTestPython)

with the same result.

How do I set a dependency on multiple tests?

tavaughan commented 6 years ago

Underlying problem has been identified and fixed. Will remove the redundant python test and commit shortly.

cpinter commented 6 years ago

@tavaughan Excellent, thank you! Can you please amend the message for the commit that fixed the issue with what the problem was and how you solved it? Now it's only you who knows... If it's too much of a hassle to amend it, alternatively you can explain it here. Thanks!

tavaughan commented 6 years ago

Is that a bit more clear?

(Gah, let me fix a typo)

cpinter commented 6 years ago

Perfect, thank you! No worries about typos :)

tavaughan commented 6 years ago

One last thing - I think image extents are supposed to be min/max inclusive. So I made that change too.

cpinter commented 6 years ago

I'm not sure what min/max inclusive means. The extent of an image needs to contain the minimum index and the maximum index. For example for an image that is 512 voxels wide, the X extent will be (0,511).

tavaughan commented 6 years ago

That's what I meant by min/max inclusive.

tavaughan commented 6 years ago

Sorry, yet another thing. The ground truth output files are ~1-2MB. Should I do something about that?

lassoan commented 6 years ago

It would be better to have a size of up to a few hundred KB instead. Maybe decrease size of input data?

cpinter commented 6 years ago

I think it should possible to decompress files using CMake.

tavaughan commented 6 years ago

Merged the latest commit into the previous commit, so there isn't still a 1-2MB file in the repo history.

cpinter commented 6 years ago

Excellent, thanks a lot Thomas! I'll integrate the pull request once I finalized the Qt5/VTK8 migration, then close this issue.

cpinter commented 6 years ago

The test fails every now and then (an example), so I'll have to disable it because of an issue that prevents dependent extensions from even building if there is a failing test in an extension it depends on (and there are several extensions depending on SlicerRT).

Also reopening the issue until the test is made stable.

cpinter commented 6 years ago

Correction: the dependent extension problem has been fixed, see https://issues.slicer.org/view.php?id=4247

I'll keep this issue open though because the tests should only fail if there are actual regressions, and not randomly.

tavaughan commented 6 years ago

The issue is not that the test fails randomly. The issue is that the test fails on Windows. Please do not remove it - it shows that a problem has started for a particular operating system.

I do not have access to the test data. Is the problem that the histograms are actually different, or is the problem that the output .csv files are different (newlines?)?

cpinter commented 6 years ago

Indeed, it is not an undeterministic failure but a consistent Windows failure. I ran the test using my local build, here is the CSV file generated by the failing test. PolyDataDistancesRawOutput.zip

cpinter commented 6 years ago

vtkPolyDataDistancesHistogramOutputComparisonTest also fails, here's the test output (GitHub said csv was not supported so needed to zip it). PolyDataDistancesHistogramOutput.zip

cpinter commented 6 years ago

The difference in the latter file is only line ending.

The first file, besides line ending has another type of difference, which is about formatting (3.49706e-005 vs 3.49706e-05).

If the CSV can be written with specified line ending and formatting, then it can be a solution. Otherwise the baseline needs to be read in and the entries compared within the test.

cpinter commented 6 years ago

@tavaughan I fixed the tests on Windows by adding Linux and Windows style ground truth files and using the proper one on each system. Can you please update your working copy and run the tests on Linux and close this issue if it works? Thanks!

tavaughan commented 6 years ago

Thanks @cpinter, I'm rebuilding Slicer and will test shortly.

tavaughan commented 6 years ago

I do not get any errors when testing with the most recent master branch of Slicer and the most recent master branch of SlicerRT:

~/devel/SRTD/inner-build/SegmentComparison$ ctest
Test project /home/vaughan/devel/SRTD/inner-build/SegmentComparison
    Start 1: qSlicerSegmentComparisonModuleGenericTest
1/9 Test #1: qSlicerSegmentComparisonModuleGenericTest ...............................   Passed    3.00 sec
    Start 2: qSlicerSegmentComparisonModuleWidgetGenericTest
2/9 Test #2: qSlicerSegmentComparisonModuleWidgetGenericTest .........................   Passed    1.84 sec
    Start 3: vtkSlicerSegmentComparisonModuleLogicTest_EclipseProstate_Base
3/9 Test #3: vtkSlicerSegmentComparisonModuleLogicTest_EclipseProstate_Base ..........   Passed    0.62 sec
    Start 4: vtkSlicerSegmentComparisonModuleLogicTest_EclipseProstate_SameInput
4/9 Test #4: vtkSlicerSegmentComparisonModuleLogicTest_EclipseProstate_SameInput .....   Passed    0.63 sec
    Start 5: vtkSlicerSegmentComparisonModuleLogicTest_EclipseProstate_Transformed
5/9 Test #5: vtkSlicerSegmentComparisonModuleLogicTest_EclipseProstate_Transformed ...   Passed    0.66 sec
    Start 6: vtkPolyDataDistanceHistogramFilterExecutionTest
6/9 Test #6: vtkPolyDataDistanceHistogramFilterExecutionTest .........................   Passed    1.42 sec
    Start 7: vtkPolyDataDistancesRawOutputUnpackTest
7/9 Test #7: vtkPolyDataDistancesRawOutputUnpackTest .................................   Passed    0.01 sec
    Start 8: vtkPolyDataDistancesRawOutputComparisonTest
8/9 Test #8: vtkPolyDataDistancesRawOutputComparisonTest .............................   Passed    0.00 sec
    Start 9: vtkPolyDataDistancesHistogramOutputComparisonTest
9/9 Test #9: vtkPolyDataDistancesHistogramOutputComparisonTest .......................   Passed    0.00 sec

100% tests passed, 0 tests failed out of 9

tavaughan commented 6 years ago

Just noted that there is still at least one experimental build that fails: http://slicer.cdash.org/viewTest.php?onlyfailed&buildid=1134376

Will re-open so we can keep an eye on it.

cpinter commented 6 years ago

After the latest fix, the same test on the same factory passes http://slicer.cdash.org/viewTest.php?onlyfailed&buildid=1135166 Closing again, hopefully for good.

SlicerRt / SlicerRT

Histogram from vtkPolyDataDistanceHistogramFilter are sometimes different #72