fix value error (and missing data) in table

pieper commented 6 years ago

The old implementation was indexing beyond the end of the rows because it does not add rows for skipped keys, but did use the key index when setting cell values.

Importantly this led to missing data (typically the last two rows due to the Image and Mask keys.

In old VTK python this was silently accepted and might cause memory corruption. But VTK's python wrapping now does bounds checks, so the old implementation led to ValueError exceptions.

This fixed code now tracks the skipped keys and adjusts the row index accordingly.

fedorov commented 6 years ago

I get runtime error below, but I am also getting it without those changes ...

ValueError: expects 0 <= id && id < this->GetNumberOfValues()

pieper commented 6 years ago

Hmm, okay - what circumstance did you try? I had one label in a segmentation of the MRHead data so Image and Mask keys were ignored. I thought the proposed implementation would handle any of the expected cases.

fedorov commented 6 years ago

I simply ran the self-test.

fedorov commented 6 years ago

Did something change in VTK API? It looks like col array needs to be initialized with the size, and also the value need to be strings ...

pieper commented 6 years ago

I see what happens - the _featureNames are only calculated once, so when the test runs the second feature extraction it crashes. Good thing we have Self Tests!

The VTK api didn't change, but as I said it started doing bounds checking, so what previously was memory corruption is now a ValueError, which is an improvement.

The size of the vtk array is set correctly when rows are added for each feature.

I'll update the PR.

fedorov commented 6 years ago

Still is not working for me. The test "passes", but the table only has the header, and I get errors like below. I don't know, perhaps this is due to how I set up the extension - I installed from nightly, and then switched the module paths to point to my local build.

The VTK api didn't change, but as I said it started doing bounds checking, so what previously was memory corruption is now a ValueError, which is an improvement.

It was failing for me at index 0, and I thought that was because the array size was not set, and it should be prior to calling SetValue(): https://www.vtk.org/doc/nightly/html/classvtkStringArray.html#ad498c244f07e77b9a86856dc003e5629. But perhaps that was something else.

Not sure if we should merge and try tomorrow with a clean nightly.

fedorov commented 6 years ago

Steve can you rebase you branch to the latest master and test in that mode?

pieper commented 6 years ago

Can you send the full error log? It looks like the csv file format for how the keys are stored changed dramatically but I don't see any commit that should have caused that.

Regarding the length of the column, it is adjusted by added row. If all the keys failed to parse then no rows would have been added.

fedorov commented 6 years ago

I started looking, and it appears that output features are parsed from the console output, not output csv. Not sure why. Maybe best to just discuss on Wednesday.

JoostJM commented 6 years ago

@fedorov, @pieper, yes I used the output as it removed some unnecessary IO. I'll take a look at the code and try to see whats happening. I believe I may know whats causing the error here.

fedorov commented 6 years ago

@JoostJM I would strongly recommend using file IO and get features from CSV. It is too error-prone to rely on console output. Parsing console output is very uncommon (both in Slicer and outside), and developers will not think about it while developing an (independent from Slicer!) pyradiomics library and producing debug messages. We may also get console output from dependency libraries. So many things can interfere! Most importantly, it is really not necessary, as there is no overhead/complexity in parsing the CSV. You can use slicer.app.temporaryPath to save it in a temp file.

fedorov commented 6 years ago

And I owe an apology - I should have caught this IO issue at the time the feature was introduced. Sorry about missing this. I don't recall if I reviewed that PR.

JoostJM commented 6 years ago

No problem, I'll revert that change.

JoostJM commented 6 years ago

@pieper, @fedorov, I reverted the change so it now makes use of a temporary tablenode again. Additionally, I updated how it fills the table a bit. Instead of keeping track of which keys are skipped, I just build up a dictionary of keys and their corresponding row indices. If a new key is found, it is added as a row, and the dictionary is later used to select that row.

pieper commented 6 years ago

The changes make sense to me 👍

fedorov commented 6 years ago

I confirm it works for me now in my testing!

Thank you!

👍

AIM-Harvard / SlicerRadiomics

fix value error (and missing data) in table #45