QIICR / QuantitativeReporting

Segmentation-based measurements with DICOM import and export of the results.
https://qiicr.gitbooks.io/quantitativereporting-guide
Other
23 stars 23 forks source link

Add support of quantitation plugins to SegmentStatistics #161

Closed fedorov closed 7 years ago

fedorov commented 7 years ago

From an email exchange:

@fedorov:

2) ability to extend SegmentStatistics with extra quantitation plugins. It would be nice to be able to register new plugins with that module, and then enable plugins that make sense depending on the input modality.

@lassoan:

SegmentStatistics module has been designed to make extension with additional metrics very easy.

Basically, you just need to call all additional statistics computation methods at the end of SegmentStatisticsLogic.computeStatistics. The method has to add its own measurement names to SegmentStatisticsLogic.keys and measurement values to SegmentStatisticsLogic.statistics.

What is missing is the plugin mechanism. For example some (probably static) register/unregister method in SegmentStatisticsLogic that would add/remove additional stats computation methods in some global variable in the slicer namespace

@chribaue:

What do you think about looking how to extend SegmentStatistics with a capability to incorporate quantitation plugins, while Andras is working on adding support of quantities/units?

I can certainly do that.

chribaue commented 7 years ago

Hi Andras, Andrey,

adding a static register method to SegmentStatisticsLogic in order to compute additional measurements would be easy, indeed. However, there are two other issues we should also consider in this context.

Currently, the only interface in the SegmentStatisticsLogic to trigger computation (SegmentStatisticsLogic.computeStatistics) calculates all measurements for all segments, which is perfectly fine for many applications. But I would need for our task (PET tumor measurements in the QuantitativeReporting module) two additional features where additional interface methods would be needed:

(a) We want to automatically update measurements whenever a segment in a segmentation gets modified. Recalculation of all measurements for all segments might take much more time than just updating the measurements for this one segment. For our task, calculation of SUV peak takes about 1 second per segment. If there are ten segments in the dataset, running SegmentStatisticsLogic.computeStatistics with the additional SUV peak would take 10 seconds where the user interface would not be responsive. Therefore, a SegmentStatisticsLogic.updateStatisticsForSegment method or something similar would be needed. The Quantitative Reporting module would have to observe changes to the segments and call this method (instead of https://github.com/QIICR/QuantitativeReporting/blob/master/QuantitativeReporting.py#L404)

(b) A user may want to select only a specific set of measurements to compute/report. Via the QuantitativeReporting module, we would like to write only specific measurements to the DicomSRs instead of a list of all measurements other developers considered useful. So, either we add to SegmentStatisticsLogic functionality to select/compute only these interesting measurements, or we add functionality to QuantitativeReporting to select/filter out only measurements a user wants to see/export.

What are your thoughts on this?

fedorov commented 7 years ago

@chribaue I agree with both points. It makes a lot of sense to support those scenarios in SegmentStatistics. For (b), I think we should do both - allow selection of feature calculators in SegmentStatisticsLogic, and also have an option to expose selection to the user. If there is a convention about defining groups of measurements, and controlling enabling/disabling of the individual measurements at the SegmentStatistics plugin level, we should be able to auto-generate the UI elements for the user from the description of the plugin API. Should we use JSON for defining plugin measurement groups, individual features, and quantities/units that result from the calculations?

lassoan commented 7 years ago

Yes, I agree.

Probably it's better to define plugins by using a class interface instead of a single functions. Similarly how it is done in the DICOMPatcher module: DICOMPatcherRule is the base class for all rules.

To choose which measurements to perform, a list of checkboxes shown in an Advanced section would work (similarly to how the DICOM plugin selector works).

Would plugins have user-configurable options?

chribaue commented 7 years ago

@lassoan: I would not need user-configurable options for my task, but we should leave this option open.

Right now I'm thinking of something like this:

class SegmentStatisticsCalculatorBase(object): # base class for calculators
  def __init__(self):
    self.name = ""
    self.keys = () # all supported measurements
    self.requestedKeys = () # all measurements that will be calculated
    self.optionsWidget = qt.QWidget()
  def computeStatistics(self, segmentationNode, segmentID, grayscaleNode=None):
    pass
  def createDefaultOptionsWidget(self):
    ... creates list of checkboxes that allow selection of requested keys   
  def register(self):
    # something similar to
   https://github.com/Slicer/Slicer/blob/eefeac9286f48552e8418a11f412b2823a09b407/Modules/Loadable/Segmentations/EditorEffects/Python/AbstractScriptedSegmentEditorEffect.py#L43

class GrayscaleVolumeStatisticsCalculator(SegmentStatisticsCalculatorBase):
  def __init__(self):
    super(GrayscaleVolumeStatisticsCalculator,self).__init__()
    self.name = "Grayscale Volume Statistics"
    self.keys = ("GS voxel count", "GS volume mm3", "GS volume cc", "GS min", "GS max", "GS mean", "GS stdev")
    self.requestedKeys = ("GS voxel count", "GS volume mm3", "GS volume cc", "GS min", "GS max", "GS mean", "GS stdev")
    super(GrayscaleVolumeStatisticsCalculator,self).createDefaultOptionsWidget()
    ... developer may add extra options to configure other parameters
  def computeStatistics(self, segmentationNode, segmentID, grayscaleNode=None):
    statistics = {}
    for key in self.requestedKeys:
      statistics[key] = ...
    return statistics

SegmentStatisticsLogic will get functionality to specify which calculators will we applied. SegmentStatisticsWidget will get some upgrades to the user interface: check list to select calculators and display of the calculators' optionsWidgets

fedorov commented 7 years ago

We should see how we can make this work with this library - it will be in demand, and it is a good use case to test the design: https://github.com/radiomics/pyradiomics

Some of the options there will indeed have parameters.

lassoan commented 7 years ago

@chribaue It looks good to me!

fedorov commented 7 years ago

@chribaue I suggest that semantics of the input volume ("greyscale") should be separate from the semantics of the measurement ("count") and separate from the measurement units ("mm3"). It is not a good idea to parse these out of the current "keys", and it will not generalize. We can have default values: when input image is loaded not from DICOM and has no information about its content, we can then assign "greyscale", and for units we can assign "no units" or something like this as a default. It could also be good to have the capacity to query codes for the measurements/units to address the DICOM serialization needs.

chribaue commented 7 years ago

@fedorov I agree that each measurement should not just have a short (hard to interpret) "key", but an actual longer understandable name and proper units. And DICOM codes.

But we still need a short label for the produced measurement table's header, that should represent the most important information to the user.

I think the reason why LabelStatistics appends measurements with "GS" "LM" or "CS" is because each of these current three "Calculators" may e.g. add several volume measurements which would be calculated slightly differently. There's "GS volume mm3", "GS volume cc", "LM volume cc", "CS volume cc" and so on. Altogether 6 volume measurements already... and there needs to be way that a user of the produced measurement table knows where the measurement came from.

Other modules/classes (e.g. the QuantitativeReporting module) use the keys of LabelStatistics class already to derive other information about the measurement result. And I think the interface of LabelStatistics should remain backwards compatible.

We could later add to LabelStatisticsLogic a method getCalculator(self, key), that returns the calculator and then query methods to the calculator to obtain more description information like quantities, dicom codes... for each of it's measurements.

chribaue commented 7 years ago

@ fedorov I took a quick look a https://github.com/Radiomics/SlicerRadiomics/blob/master/SlicerRadiomics/SlicerRadiomics.py and I think it would fit into the current design.

lassoan commented 7 years ago

I'm working on adding column metadata support to table nodes, such as quantity, unit, description, which may be used for generating tooltips or detailed reports. So, instead of a list of strings for keys we'll probably have a list of dictionaries that will store the key but also this additional metadata.

chribaue commented 7 years ago

@lassoan this would help. When will it be ready to use?

lassoan commented 7 years ago

I try to prepare a first testable version by the end of the week.

chribaue commented 7 years ago

Okay, I'll start using keys for now and we'll replace them by something else later

fedorov commented 7 years ago

I agree that each measurement should not just have a short (hard to interpret) "key", but an actual longer understandable name and proper units. And DICOM codes.

But we still need a short label for the produced measurement table's header, that should represent the most important information to the user.

Yes, I agree. Then this key can be used to query semantics of the produced measurement from the plugin at runtime.

chribaue commented 7 years ago

I've got a first version of the Segment Statistics Module with plugins working and would like to solicit some input from you guys on how the user interface should look like. I attached a screenshot of the current version.

As mentioned earlier, the different plugins are subclasses of SegmentStatisticsCalculatorBase, each of them has a list of keys (measurements it can calculate) and requestedKeys (measurements the user wants the plugin to calculate). Instances are registered to the SegmentStatisticsLogic using a static method and the class creates an instance of each registered calculator upon creation. SegmentStatisticsCalculatorBase creates for each calculator a widget that allows the user to select the measurements they are interested in.

How would you envision the user interface?

screenshot from 2017-05-08 15 03 42

lassoan commented 7 years ago

It looks very nice! Maybe you could add a "default" option in addition to "all" and "none" (plugins may have metrics that are rarely needed or takes a long time to compute, which may be disabled by default).

Where do you save the status of the checkboxes. Probably the best would be to save them to a parameter node (vtkMRMLScriptedModuleNode - http://apidocs.slicer.org/master/classvtkMRMLScriptedModuleNode.html#abdb7bd603db7ccf66fc048672f111fdc).

chribaue commented 7 years ago

Right now the state is stored in the calculator classes themselves, but I like the idea of storing it separately in a mrml node that all can access; plus it allows storing the user's configuration as part of the mrml scene.

What do you think makes more sense: (a) Having a singleton instance of vtkMRMLScriptedModuleNode like in https://github.com/Slicer/Slicer/blob/ffd102a4cb353d66872ecbd06d210eddf4d76506/Modules/Scripted/EditorLib/EditUtil.py#L27 that can be accessed from everywhere in Slicer to access/modify the configuration. (b) Or should we have something like the CLI's parameter sets that the user can switch between?

lassoan commented 7 years ago

Yes, similarly to CLIs, allow the user to select from multiple parameter nodes (presets). Using a singleton node would impose too many limitations (there could be side effects when you load a scene or use the module from another module etc). See for example how the parameter node is implemented in PlusRemote.

fedorov commented 7 years ago

@chribaue thanks for the update! Looks great for the initial iteration!

I have few suggestions (for now or for later):

chribaue commented 7 years ago

@fedorov

we could consider a different widget for displaying and selecting the individual features

I think the problem with designing the widget for configuring a plugin is that different plugins may have different needs. Right now, it's only about selecting features. But other plugins may also need setting of other parameters to configure the feature calculation. That's why in my current design the different plugins are able to create their own configuration widget and the base class only provides a convenience method to create the default widget that you see right now.

maybe one tab per plugin is better than a collapsible button?

I think with more modules, the list of tabs would get inconvenient to use. As an alternative, we might display a list of plugins at the top of the settings section, the user can select a plugin, and then the plugin's configuration widget gets displayed below.

I am still not quite sure individual feature names should be prepended with abbreviations "LM/GS/PT"

The configuration widget should display more meaningful names for the features. But what would you use to distinguish different different volume measurements from different plugins in the measurements table's header?

I think parameters node should not be a singleton.

Okay, I'll make it configurable

do you have a branch to look at the code?

I'll put it somewhere for you to look at.

fedorov commented 7 years ago

different plugins may have different needs [...] the base class only provides a convenience method to create the default widget that you see right now

Yes, I agree. I was just suggesting that a table/list with a search bar on top could be a more convenient default widget

I think with more modules, the list of tabs would get inconvenient to use

With more modules, any approach would become cumbersome. However, if you consider that plugin selector should only occupy a portion of the module panel, it could easier to control that appearance with a tabbed view. Combined with the list widget I suggested earlier, we could potentially provide reasonably convenient fixed height UI.

But what would you use to distinguish different different volume measurements from different plugins in the measurements table's header?

Here's one idea - what if we transpose the measurements table to have segments as columns, and features as rows? This would be a more meaningful use of space, since feature names are longer. We could also then add a column with units for each feature. It would then also be easier to visualize long feature names that we would get by prepending "volume" (in your example) with the plugin name. Or we can have a separate column for the plugin name that produced individual measurement.

Yet another way to do it is to have a separate row for each combination of label/feature, that's how we do it in SlicerRadiomics:

image

Many options to consider - we should definitely not feel stuck with the way features are presented in the current table!

@lassoan is it possible to make the header of the table "sticky" so that it stays in place when the table is scrolled?

fedorov commented 7 years ago

@chribaue the idea to show plugin list/dropdown is a good one too!

lassoan commented 7 years ago

Tabbed view only works up to 3-4 tabs, because tabs require lots of horizontal space and the module widget is relatively narrow. So, I wouldn't recommend to have a tab for each plugin. Vertical tabs might work somewhat better, but those are not very easy to use either. A sortable, filterable (show all or show metrics provided by a selected plugin) table might be work well. You could show options for the selected table row. For now I would go with the simplest and easiest to implement GUI. When number of plugins and metrics are growing and we'll have metric options, etc. then we can refactor the GUI as needed.

To make the top row of a table "sticky": click on "Lock first row" button in Tables module (or call tableNode.SetUseColumnNameAsColumnHeader(True)).

cpinter commented 7 years ago

Great discussion and progress!

I like the plugin interface and the GUI as well (we can change how we present the list of checkboxes quite easily later if needed). I also agree that a non-singleton parameter node would be better. It is the Slicer convention with good reason (e.g. loading scenes, reusability, presets).

One question: fractional labelmap support is coming soon (basically once OpenGL2 becomes the default in Slicer), and calculating statistics will be needed on those too. For those who are not familiar with it, it's basically another representation for the segments, which, instead of binary voxel values for the labelmap (binary lablemap), stores a fractional value representing the ratio or probability of the voxel belonging to the segment. How do you think those could be supported? Different plugins for the fractional calculations, or enabling the main module and plugins to handle fractional labelmaps?

fedorov commented 7 years ago

Well, to keep it in spirit of "simplest and easiest", I would leave fractional label maps for later.

cpinter commented 7 years ago

The reason I brought it up is that there is a new design being finalized right now, and if we don't think about this, then it may be really hard to add later. I'd like to make sure that this design accommodates it in some ways, and if it doesn't, then we can potentially make some slight changes to facilitate fractional lablemaps in the future.

chribaue commented 7 years ago

The calculator plugins get a handle to the segmentation and the grayscale volume and then return some measurements for the segments of the segmentation. It's up to the plugin to decide if it wants to obtain statistics based on the segments binary labelmap representation, closed surface representation, or fractional labelmap representation. Right now, there's one basic calculator for Labelmaps and one for closed surfaces. It should be easy to add one for fractional labelmaps.

cpinter commented 7 years ago

Thanks! So I guess the way to go will be to add a fractional labelmap variant to some of the plugins as new plugins. Sounds good to me!

chribaue commented 7 years ago

@fedorov

do you have a branch to look at the code?

Here's the current code: https://github.com/chribaue/Slicer/blob/update-LabelStatistics/Modules/Scripted/SegmentStatistics/SegmentStatistics.py

It's not finished yet, but registering plugins works and parameters are now stored in a vtkMRMLScriptedModuleNode.

My plans for the next steps are: (a) improve the interface for the Plugins to enable requesting information about the measurements .based on the keys; i.e. a more meaningful name, units, and probably DICOM codes for the units. (b) finish implementation of my PET measurements plugin. (c) after that we can maybe improve the user interface and see how to integrate the selection into the Quantitative Reporting module. (d) some automated tests might be nice. This functionality will be part of the Slicer core.

lassoan commented 7 years ago

some automated tests might be nice.

At least add a test for the main success scenario (or anything that you would feel embarrassed about if it was not working because you did not test it).

There is already a test in the SegmentStatistics (test_SegmentStatisticsBasic) that may need only a few small additions to test that you can enable/disable computation of certain metrics and you can register a plugin.

fedorov commented 7 years ago

@lassoan I hate to remind you again, but 17 days ago you said you would prepare the first stable version in a week, and there were no updates. Is this harder than you expected, or you just were not able to allocate the time?

lassoan commented 7 years ago

Sorry, it's just allocating time. Not too much work left, I just need an uninterrupted day or so and it's difficult to find that. Let's have a meeting late next week to discuss status and next steps. Would Thursday anytime before 3pm or Friday anytime work for you?

chribaue commented 7 years ago

I'd like to get your input on two points regarding the class interfaces:

(a) We wanted a way to obtain more meaningful information about the type of measurement including a readable name, a description, units, and dicom codes. I'd propose to add to the measurement calculator class a method getMeasurementInfo that returns a dictionary containing this information. So it's easy to add extra information. I also use it to provide a more easily understandable user interface (see screenshot below). The method requires the measurement's key as a parameter, but also the gray-scale volume node to obtain correct units. Therefore the signature would be def getMeasurementInfo(self, key, grayscaleNode=None). Should the method's interface also include the segmentationNode? Can anyone think of a scenario where the segmentationNode might also be needed?

screenshot from 2017-05-19 15 44 41

(b) We'd like to have a way to update only the statistical measurements for a specific segment (instead of all segments in the segmentation). Currently SegmentStatisticsLogic has the method def computeStatistics(self, segmentationNode, grayscaleNode, visibleSegmentsOnly = True) which is responsible for all the computations. We could have a method def updateStatisticsForSegment(self, segmentID) which requires that computeStatistics was called first, or instead a method def computeStatisticsForSegment(self, segmentationNode, segmentID, grayscaleNode=None), which doesn't require this prior call to computeStatistics. What would you prefer?

fedorov commented 7 years ago

I'd propose to add to the measurement calculator class a method getMeasurementInfo that returns a dictionary containing this information.

Did you consider to return the complete dictionary, including the measurements, as the output of the calculator? If it is returned in a separate call, I think it will be more error-prone. Why would you want to separate those?

Therefore the signature would be def getMeasurementInfo(self, key, grayscaleNode=None)

I suggest we stay away from "grayscale". How about "imageNode" and "segmentationNode"? I don't like it even when it is used for variable names.

I also just today looked over your code. Is there really a reason to separate GS and LM features? I think it is a common convention that for any calculator, there will be an image volume and a label. The propagation of LM and GS is confusing to me. I also suggest we consider this document [1] that comes out of a consensus group on how to define various quantitative features. We are following it as much as we can in pyradiomics development, and I think it will help if we are consistent here as well.

[1] https://arxiv.org/abs/1612.07003

Should the method's interface also include the segmentationNode?

One situation where it may be needed is that for some features image volume is not necessary, and all features can be calculated just from the segmentation node (e.g., shape features, or many of the morphological features from [1] referenced above). So perhaps this is needed indeed.

or instead a method def computeStatisticsForSegment(self, segmentationNode, segmentID, grayscaleNode=None), which doesn't require this prior call to computeStatistics

I would prefer this, since I think it is more fool-proof.

chribaue commented 7 years ago

Did you consider to return the complete dictionary, including the measurements, as the output of the calculator? If it is returned in a separate call, I think it will be more error-prone. Why would you want to separate those?

I need to query for certain information before actually calculating something or having actual data available (e.g. the name of the measurement) to build the UI. And some information will only be available once a specific dataset has been provided (e.g .unit of the measurement). But you're right, returning the units should be part of the measurement result. I guess, we'll split it up, getMeasurementInfo will return what it can based on the information available, and computeStatistics will also return the actual units.

I suggest we stay away from "grayscale". How about "imageNode" and "segmentationNode"? I don't like it even when it is used for variable names.

It's called grayscale everywhere in the code and also the user interface. I think both terms are easily understandable. We should just be consistent and I followed what was already there.

Is there really a reason to separate GS and LM features?

Again, I followed what was already there in the interface. Also, it makes sense because LM operates on the labelmap only, GS operates on the grayscale volume. I am just not sure why the gray-scale statistics include volume measurements. That's quite redundant. But as discussed above, maybe some users use the GS volume measurements already. So I don't want to change the interface.

Andras, what's your point on this? I think you're the original developer of this extension and I only wanted to extend your interface instead of re-designing it.

fedorov commented 7 years ago

Maybe this is also something that is easier to discuss on Thursday when we talk.

lassoan commented 7 years ago

I agree that grayscale may not be the best name, as it refers to how it is displayed (if a non-gray colormap is used then it is quite inaccurate). Maybe "scalar image" or "intensity image" would be a bit more correct. Or, simply "image" if the plugin can compute metrics for color/vector images, too.

Grayscale volume may be different from the labelmap volume, because it only includes that part of the grayscale volume that is inside the labelmap.

chribaue commented 7 years ago

@lassoan

@fedorov, @che85 and I discussed some design decisions about the extending the SegmentStatisticsModule, we’d like you to review.

I’ve already integrated a plugin mechansim and switched to a ScriptedModuleParameterNode to configure what measurements the plugins/SegmentStatisticsLogic are supposed to calculate: https://github.com/chribaue/Slicer/blob/update-LabelStatistics/Modules/Scripted/SegmentStatistics/SegmentStatistics.py In this current version, I kept the interface of SegmentStatisticsLogic untouched; everything is fully backwards compatible.

However, @fedorov, @che85, and I are considering making two changes to the interface:

@lassoan What do you think? Do you agree with these proposed changes?

fedorov commented 7 years ago

I will follow up with some comments later, but don't have time at the moment...

fedorov commented 7 years ago

I wanted to add to the second point of @chribaue. What I suggested is that currently the code essentially assumes that statistics items keys are globally unique. Considering different statistics calculators can be applied to the same input and combined into the same table, I suggest to use the following convention for measurement keys: <calculator name>.<measurement within calculator key>.

chribaue commented 7 years ago

@fedorov I would strip the class names LabelmapSegmentStatisticsCalculator, GrayscaleSegmentStatisticsCalculator etc. down to Labelmap, Grayscale etc. to derive the measurement key. The ...SegmentStatisticsCalculator is redundant

fedorov commented 7 years ago

@chribaue sure, makes sense

fedorov commented 7 years ago

@chribaue for the reference, this is how the attributes of the volume are initialized for a DICOM parametric map:

image

It is also consistent with how they are initialized for the dcmqi converter: https://github.com/QIICR/dcmqi/blob/master/doc/examples/pm-example.json#L9-L23

So perhaps it can be done the same way for the statistics plugin measurements?

lassoan commented 7 years ago

It would make sense to store all information necessary for processing in the parameter node instead of passing them to the logic’s methods or storing them as members in the logic.

Sounds good. It could be useful to keep some lower-level API exposed that allows computations without the need of setting up a parameter node.

The calculated statistical measurements would also be stored in the parameterNode

I guess you mean in the table node associated with the parameter node. Storing computation results directly in the parameter node would lead to data duplication (and so potential data inconsistency, etc) and would make the entire table metadata storage pointless.

We’d like to replace the current keys with more self explanatory terms.

Yes, sure.

Would such a key also be acceptable for the table header or how else could we provide a suitable table header?

I would try to keep column header as short as the values it contains, otherwise it would waste a lot of space and would be difficult to browse and show a reasonable amount of data on screen. However, you can add the long name to the column metadata (description or long name), which is displayed when you hover over the column with your mouse.

fedorov commented 7 years ago

The calculated statistical measurements would also be stored in the parameterNode

I guess you mean in the table node associated with the parameter node.

I think it would make sense to store the pointer to the table node in the parameter node.

lassoan commented 7 years ago

I think it would make sense to store the pointer to the table node in the parameter node.

Yes, for sure (as a node reference)

chribaue commented 7 years ago

@fedorov I will return the dicom triplets in the same way as you mentioned above. It makes sense to stay consistent.

chribaue commented 7 years ago

@lassoan, @fedorov

following the comments above, I'll go ahead and (a) store all statistical computation results and metadata in the table node referenced by the logic's parameter node instead of using the Logic's self.statistics (b) change the keys and table headers from "LM volume" to "Labelmap.volume" etc.

lassoan commented 7 years ago

change the keys and table headers from "LM volume" to "Labelmap.volume" etc.

It would be perfect for keys and could be displayed in a tooltip. However, using "Labelmap.volume" as table header would not be either short, clear, or user friendly.

We could simply use "volume" when volume is computed only one method.

We would still need to figure out how to distinguish if the same quantity is computed by different methods. Probably appending something to the column name would work, such as "volume [LM]"? Where "LM" would be an abbreviated plugin name. Using the abbreviation would allow short column names. Unit, full description of the quantity and the plugin that it computed, etc. are shown in the popup anyway.

fedorov commented 7 years ago

To me, the only advantage of "volume [LM]" is that it is short, but to me it is not user friendly or clear. It is also more confusing.

Here are few more suggestions to make the interface more user friendly and clear: