Closed cpinter closed 6 years ago
2017-07-10 17:27 Thomas Vaughan: I refer to the section that computes the histogram. The problem may or may not involve the image accumulation filter itself. I reworded the description to be more clear.
After a discussion with @tavaughan this is a reproducibility issue, and the best course of action would be to create an automated test on two different input polydata (possibly vtk files from https://github.com/SlicerRt/SlicerRT/tree/master/Testing/Data, with a simple transform involving rotation applied with vtkTransformPolyDataFilter), set an exact baseline (not like for Dvh when we assert a maximum error), and see if that baseline is consistently met.
I've run this test five times. Seems to pass every time.
Still would be good to observe how it behaves over a longer period with more tests...
Sounds good. Thank you, Thomas! If the test is stable for a month or so, then I think we can close this issue.
I see this is not merged into the master. We should do it so that the test actualy runs every night.
Example for comparing two files for a CMake test https://github.com/SlicerRt/SlicerRT/blob/master/DoseAccumulation/Testing/Cxx/CMakeLists.txt#L38-L42
I wrote a c++ version of the same test. The output is different.
(Left side == C++, right side == Python) Comparing 330 and 330 Comparing 136 and 136 Comparing 240 and 240 Comparing 180 and 180 Comparing 172 and 172 Comparing 640 and 640 Comparing 392 and 392 Comparing 576 and 576 Comparing 1734 and 58534 Comparing 155907 and 97079 Comparing 1808 and 3660 Comparing 2082 and 2258 Comparing 2840 and 2840 Comparing 2024 and 2024 Comparing 1882 and 1882 Comparing 1736 and 1736 Comparing 971 and 971 Comparing 1214 and 1214 Comparing 702 and 702 Mismatch found!
I'll commit the code shortly. We plan to output the results from both versions as a csv or text file. Then we'll compare the two outputs to make sure they are the same.
Where is an appropriate place to store these files? I thought I saw a temporary directory variable somewhere, but I can't find it anymore.
(I'll need to know how to access the directory from both Python and C++)
Thanks Csaba, can I also use command line arguments for Python tests? If so how would I do that?
Nevermind -- I think I figured it out:
Start slicer: ./Slicer --myarg Foo
Enter into Python Interactor:
slicer.app.commandOptions().unparsedArguments (u'--myarg', u'Foo')
I usually use --python-code argument, see
https://www.slicer.org/wiki/Documentation/Nightly/ScriptRepository#Launching_Slicer
Do you mean --python-code="slicer.polydatadistancesfile=${TEMP}/PolyDataDistancesOutputPython.csv"
Then in the code I would refer to the slicer.polydatadistancesfile variable?
You can try what is done here: LoadAnnotationRulerScene.py
Thanks, I copied that example and it "works" now.
It's actually producing a different output from either previous incarnation of the test... (Left side == New Python test, Right side == Old Python test) Comparing 94 and 94 Comparing 330 and 330 Comparing 136 and 136 Comparing 240 and 240 Comparing 180 and 180 Comparing 172 and 172 Comparing 640 and 640 Comparing 392 and 392 Comparing 576 and 576 Comparing 58480 and 58534 Comparing 97133 and 97079 Comparing 3660 and 3660 Comparing 2258 and 2258 Comparing 2840 and 2840 Comparing 2024 and 2024 Comparing 1882 and 1882 Comparing 1736 and 1736 Comparing 971 and 971 Comparing 1214 and 1214 Comparing 702 and 702
I'm having trouble setting dependencies on multiple tests. Right now I have two tests to run the filter:
py_PolyDataDistanceHistogramFilterExecutionTestPython vtkPolyDataDistanceHistogramFilterExecutionTestCpp
Once these two tests run, there are two more tests that compare the outputs of the tests:
vtkPolyDataDistancesRawOutputComparisonTest
(compares each individual distance)
vtkPolyDataDistancesHistogramOutputComparisonTest
(compares histograms of the distances)
I've tried using set_tests_properties to set test order, but it only seems to work with one test at a time. If I do:
set_tests_properties(vtkPolyDataDistancesRawOutputComparisonTest PROPERTIES DEPENDS vtkPolyDataDistanceHistogramFilterExecutionTestCpp DEPENDS py_PolyDataDistanceHistogramFilterExecutionTestPython)
vtkPolyDataDistanceHistogramFilterExecutionTestCpp won't run until after all other tests have run.
I also tried:
set_tests_properties(vtkPolyDataDistancesRawOutputComparisonTest PROPERTIES DEPENDS vtkPolyDataDistanceHistogramFilterExecutionTestCpp) set_tests_properties(vtkPolyDataDistancesRawOutputComparisonTest PROPERTIES DEPENDS py_PolyDataDistanceHistogramFilterExecutionTestPython)
with the same result.
How do I set a dependency on multiple tests?
Underlying problem has been identified and fixed. Will remove the redundant python test and commit shortly.
@tavaughan Excellent, thank you! Can you please amend the message for the commit that fixed the issue with what the problem was and how you solved it? Now it's only you who knows... If it's too much of a hassle to amend it, alternatively you can explain it here. Thanks!
Is that a bit more clear?
(Gah, let me fix a typo)
Perfect, thank you! No worries about typos :)
One last thing - I think image extents are supposed to be min/max inclusive. So I made that change too.
I'm not sure what min/max inclusive means. The extent of an image needs to contain the minimum index and the maximum index. For example for an image that is 512 voxels wide, the X extent will be (0,511).
That's what I meant by min/max inclusive.
Sorry, yet another thing. The ground truth output files are ~1-2MB. Should I do something about that?
It would be better to have a size of up to a few hundred KB instead. Maybe decrease size of input data?
I think it should possible to decompress files using CMake.
Merged the latest commit into the previous commit, so there isn't still a 1-2MB file in the repo history.
Excellent, thanks a lot Thomas! I'll integrate the pull request once I finalized the Qt5/VTK8 migration, then close this issue.
The test fails every now and then (an example), so I'll have to disable it because of an issue that prevents dependent extensions from even building if there is a failing test in an extension it depends on (and there are several extensions depending on SlicerRT).
Also reopening the issue until the test is made stable.
Correction: the dependent extension problem has been fixed, see https://issues.slicer.org/view.php?id=4247
I'll keep this issue open though because the tests should only fail if there are actual regressions, and not randomly.
The issue is not that the test fails randomly. The issue is that the test fails on Windows. Please do not remove it - it shows that a problem has started for a particular operating system.
I do not have access to the test data. Is the problem that the histograms are actually different, or is the problem that the output .csv files are different (newlines?)?
Indeed, it is not an undeterministic failure but a consistent Windows failure. I ran the test using my local build, here is the CSV file generated by the failing test. PolyDataDistancesRawOutput.zip
vtkPolyDataDistancesHistogramOutputComparisonTest also fails, here's the test output (GitHub said csv was not supported so needed to zip it). PolyDataDistancesHistogramOutput.zip
The difference in the latter file is only line ending.
The first file, besides line ending has another type of difference, which is about formatting (3.49706e-005 vs 3.49706e-05).
If the CSV can be written with specified line ending and formatting, then it can be a solution. Otherwise the baseline needs to be read in and the entries compared within the test.
@tavaughan I fixed the tests on Windows by adding Linux and Windows style ground truth files and using the proper one on each system. Can you please update your working copy and run the tests on Linux and close this issue if it works? Thanks!
Thanks @cpinter, I'm rebuilding Slicer and will test shortly.
I do not get any errors when testing with the most recent master branch of Slicer and the most recent master branch of SlicerRT:
~/devel/SRTD/inner-build/SegmentComparison$ ctest
Test project /home/vaughan/devel/SRTD/inner-build/SegmentComparison
Start 1: qSlicerSegmentComparisonModuleGenericTest
1/9 Test #1: qSlicerSegmentComparisonModuleGenericTest ............................... Passed 3.00 sec
Start 2: qSlicerSegmentComparisonModuleWidgetGenericTest
2/9 Test #2: qSlicerSegmentComparisonModuleWidgetGenericTest ......................... Passed 1.84 sec
Start 3: vtkSlicerSegmentComparisonModuleLogicTest_EclipseProstate_Base
3/9 Test #3: vtkSlicerSegmentComparisonModuleLogicTest_EclipseProstate_Base .......... Passed 0.62 sec
Start 4: vtkSlicerSegmentComparisonModuleLogicTest_EclipseProstate_SameInput
4/9 Test #4: vtkSlicerSegmentComparisonModuleLogicTest_EclipseProstate_SameInput ..... Passed 0.63 sec
Start 5: vtkSlicerSegmentComparisonModuleLogicTest_EclipseProstate_Transformed
5/9 Test #5: vtkSlicerSegmentComparisonModuleLogicTest_EclipseProstate_Transformed ... Passed 0.66 sec
Start 6: vtkPolyDataDistanceHistogramFilterExecutionTest
6/9 Test #6: vtkPolyDataDistanceHistogramFilterExecutionTest ......................... Passed 1.42 sec
Start 7: vtkPolyDataDistancesRawOutputUnpackTest
7/9 Test #7: vtkPolyDataDistancesRawOutputUnpackTest ................................. Passed 0.01 sec
Start 8: vtkPolyDataDistancesRawOutputComparisonTest
8/9 Test #8: vtkPolyDataDistancesRawOutputComparisonTest ............................. Passed 0.00 sec
Start 9: vtkPolyDataDistancesHistogramOutputComparisonTest
9/9 Test #9: vtkPolyDataDistancesHistogramOutputComparisonTest ....................... Passed 0.00 sec
100% tests passed, 0 tests failed out of 9
Just noted that there is still at least one experimental build that fails: http://slicer.cdash.org/viewTest.php?onlyfailed&buildid=1134376
Will re-open so we can keep an eye on it.
After the latest fix, the same test on the same factory passes http://slicer.cdash.org/viewTest.php?onlyfailed&buildid=1135166 Closing again, hopefully for good.
https://github.com/SlicerRt/SlicerRT/blob/master/SegmentComparison/Logic/vtkPolyDataDistanceHistogramFilter.cxx#L269
The section that accumulates frequencies for the histogram seems to always return a result that looks reasonable at first glance... but if you re-run it with the same data, you sometimes get a slightly different result. I don't see any immediate explanation for this behaviour.
Migrated from https://app.assembla.com/spaces/slicerrt/tickets/865-histogram-from-vtkpolydatadistancehistogramfilter-are-sometimes-different/details