Integrate with updated SegmentStatistics

fedorov commented 7 years ago

It would be good to start integrating with the code being developed in #161 to see if there are any gaps and more fully test the new functionality.

chribaue commented 7 years ago

@fedorov @che85 I made some changes to the segment statistics logic. It stores now measurement information together with the measurements themselves to avoid potential inconsistencies in case a user would modify the scalar volume reference before exporting the data to a table.

che85 commented 7 years ago

Ok a few more changes in https://github.com/chribaue/Slicer/pull/7 and then it would be ready for integrating into Slicer I think.

chribaue commented 7 years ago

@che85 I merged your pull request and will test it on my machine to make sure everything works as it should. Thank you for your contribution!

fedorov commented 7 years ago

Great progress, thank you both!

fedorov commented 7 years ago

@che85 when I reload the report from DICOM, tooltip shows CodeValue for units. When I create the report, it shows CodeMeaning - this should be consistent

fedorov commented 7 years ago

there are other minor issues like inconsistencies like capitalization and we might need to think how to keep info about the plugin name, but we can do this later. Looks ok to me!

che85 commented 7 years ago

The way those information are populated from SegmentStatistics is:

measurementInfo = statistics["MeasurementInfo"][key] if key in statistics["MeasurementInfo"] else {}
columnName =  measurementInfo['name'] if measurementInfo and 'name' in measurementInfo else key
columnName += ' ['+calculator.id+']' if (calculator and len(calculatorsWithResults)>1) else ''
col.SetName( columnName )
if measurementInfo:
  for mik, miv in measurementInfo.iteritems():
    if mik=='name':
      table.SetColumnLongName(columnName, str(miv))
    elif mik=='description':
      table.SetColumnDescription(columnName, str(miv))
    elif mik=='units':
      table.SetColumnUnitLabel(columnName, str(miv))
    else:
      table.SetColumnProperty(columnName, str(mik), str(miv))

Since we don't have the exact same information when reading DICOM, how should I proceed?

che85 commented 7 years ago

It's done this way when reading DICOM SR:

for measurementItem in measurement["measurementItems"]:
  col = table.AddColumn()

  unit = measurementItem["units"]["CodeMeaning"]

  if "derivationModifier" in measurementItem.keys():
    description = columnName = measurementItem["derivationModifier"]["CodeMeaning"]
  else:
    description = measurementItem["quantity"]["CodeMeaning"]
    columnName = "%s %s" % (description, measurementItem["units"]["CodeValue"])
    description = "%s in %s" % (description, unit)

  col.SetName(columnName)
  table.SetColumnLongName(columnName, columnName)
  table.SetColumnUnitLabel(columnName, unit)
  table.SetColumnDescription(columnName, str(description))

che85 commented 7 years ago

Which one is better? How can we simplify that process? Maybe it would make sense to have that in SegmentStatistics and then populate it from there when reading DICOM? On the other hand, if you are loading a DICOM SR which does not have the measurements provided by the registered calculators then you cannot depend on SegmentStatistics

che85 commented 7 years ago

@chribaue After discussing with @fedorov we were thinking that it would make sense to display only information within the table that would actually be saved to DICOM for guaranteeing consistent display when reading from DICOM.

Looking at the table header you would display "{QuantityCode meaning} + {UnitCode value}" (without displaying [SV] [LM] etc?

The tooltip should display the following:

"Name" "Unit: {UnitCode meaning}" "Name in {UnitCode value}"

Right now when creating a report, I am getting inconsistencies within the table regarding the tooltip showing human readable "Hounsfield unit" but "cc" instead of "cubic centimeter":

chribaue commented 7 years ago

@che85 @fedorov There are a few issues with exporting to DICOM from SegmentStatistics and importing DICOM SRs (from potential other sources): (a) the segment statistics plugins might not provide DICOM codes for certain measurements. This was always intended as optional meta-information. These results cannot be exported to DICOM SR. (b) Different plugins do create measurements with the same name (e.g. the volume measurements, mean gray value) and a plugin may provide the measurement results twice, with different units. That's the way SegmentStatistics was intended initially and @lassoan is very persistent in wanting something in the column header that reflects which plugin was used to create it. Otherwise we would have e.g. three "volume mm3" (b) when importing DICOM SRs generated by other tools there is no way to match all of them to a SegmentStatistics Plugin. (b) I'm not sure about DICOM SR, but for the purpose of data provenance would there be a way to store for each measurement which plugin produced it?

We could add a separate exportToDICOMTable() which does what you two suggested above plus perform additional checks on export like e.g. make sure every measurement the user wants to export has proper DICOM codes and there are no duplicate DICOM QuantityCodes+UnitCodes pairs and alert the user if this is not the case. But I am not sure if this functionality should be part of the SegmentStatisticsLogic or if it should be part of QuantitativeReporting, since it is something more specific to DICOM export/import.

fedorov commented 7 years ago

(a) the segment statistics plugins might not provide DICOM codes for certain measurements. This was always intended as optional meta-information. These results cannot be exported to DICOM SR.

That is fine. As you said, we will skip them on export, so they will not show up on import.

However, for those measurements that do generate codes, why not make the description consistent with the coded entries by auto-generating it from those? Why does it need to be something else? This creates unnecessary room for error.

(b) Different plugins do create measurements with the same name (e.g. the volume measurements, mean gray value) and a plugin may provide the measurement results twice, with different units. That's the way SegmentStatistics was intended initially and @lassoan is very persistent in wanting something in the column header that reflects which plugin was used to create it. Otherwise we would have e.g. three "volume mm3"

I find it very confusing including some abbreviation denoting the plugin in the column header. I think the right place for this is in the tooltip, with the full plugin name included (i.e., "Plugin: ScalarVolumeStatistics). Would this be sufficient?

Another issue with those plugins is that these labels have meaning only during a given Slicer session. Once you consider reloading those measurements, name of the plugin may be meaningless. We could store the source code URL and hash, to have at least some assistance in tracking down how the measurement was done, but it would be of no use to the non-technical end user - they would just see a string that encoded some plugin at some time in the past, and it will have no value (software tool could be different, plugin could be changed or deleted, implementation could have changed).

As is right now, I don't know how an average user can figure out why there is, for example, "[SV]" attached to the column name, and what it means - don't you agree?

(c) when importing DICOM SRs generated by other tools there is no way to match all of them to a SegmentStatistics Plugin.

Correct. That is why it is important that we have at least somewhat consistent way to display measurements that were not generated by Slicer.

(d) I'm not sure about DICOM SR, but for the purpose of data provenance would there be a way to store for each measurement which plugin produced it?

We can store this information in another modifier field that we can attach to the measurement. There are some things that need to happen on the dcmqi side first (see this PR: https://github.com/QIICR/dcmqi/pull/287), so I would not hold merging with Slicer.

lassoan commented 7 years ago

@lassoan is very persistent in wanting something in the column header that reflects which plugin was used to create it

The only hard limitation is that each column name must be unique, as VTK plotting and other some features only work correctly if there are no duplicate column names. Column names can be distinguished any way we want, for example Volume, Volume [1], Volume [2] could work, too.

We can make tooltip display more sophisticated, if needed. For example, we could print all column metadata in the tooltip; and/or add some more standard column metadata fields (generator algorithm name, version, ... preferably information that can be directly mapped to standard DICOM fields) that are displayed in the tooltip.

chribaue commented 7 years ago

Summarizing all this input we could do the following:

In SegmentStatisticsLogic.exportToTable() we generate the column names based on the {QuantityCode meaning} + {UnitCode value} if DICOM codes are available otherwise we use the plugin provided measurement name. If there are duplicate column names we turn them into "Volume mm3 [1]" "Volume mm3 [2]". We store in the tooltip the full plugin name and the MeasurementMethodCode Meaning.

When exporting to DICOM SR we store the QuantityCode, UnitCode and MeasurementMethodCode. In case different plugins provide the same triplet, I guess we will store only one of them and hopefully the measurement results produced by the different plugins are identical.

When importing from DICOM SR, we also populate the column names as {QuantityCode meaning} + {UnitCode value} and a number [1] [2] for duplicates etc. The tooltip gives information about which MeasurementMethod was used, but not necessarily the producing plugin anymore.

Would this work for everyone?

lassoan commented 7 years ago

we generate the column names based on the {QuantityCode meaning} + {UnitCode value}

How those will look like?

The tooltip gives information about which MeasurementMethod was used, but not necessarily the producing plugin anymore.

Should we add dedicated column metadata fields for this? The advantage would be that we could display it in a standard way in the tooltip and the data would be more structured (compared to just adding the generator algorithm name and version in the description text). What DICOM fields are used for specifying the quantification algorithm?

chribaue commented 7 years ago

How those will look like?

e.g. "Volume cm3" instead of "volume cc" and instead of "mean" it should be "Mean" or "Mean SUVbw" if we got the units from the dicom files.

lassoan commented 7 years ago

What will be the name of a column that contains mean intensity of a CT volume?

chribaue commented 7 years ago

I think: Mean hnsf'U

fedorov commented 7 years ago

{QuantityCode meaning} + {UnitCode value}

Almost. As @lassoan alluded, but this won't work in case of min/max etc, since the quantity is defined by the input volume units, and the measurement is encoded in the modifier. So I think we should do the same as done in the SR plugin: if derivation modifier is present, use it, otherwise use quantity. Alternatively, we could do {QuantityCode meaning}, {DerivationModifier meaning}, if field width is not of concern.

What DICOM fields are used for specifying the quantification algorithm?

Basically, the only thing that fits this purpose that I can think of is the modifier on the measurement item (row 6 in TID1419). We could have a custom concept codes for plugin name, version, any other information that we think is meaningful, and add those to the list of modifiers (a measurement item can have any number of modifiers). Then we could include this list of modifiers as column metadata. But perhaps we should leave this refinement for later? I would really like SegmentStatistics improvements to be merged soon, so we can leverage the updated Slicer and QuantitativeReporting for the MICCAI tutorial.

lassoan commented 7 years ago

I would really like SegmentStatistics improvements to be merged soon, so we can leverage the updated Slicer and QuantitativeReporting for the MICCAI tutorial.

I agree. We can keep refining things after everything is integrated.

chribaue commented 7 years ago

are the following columns headers now okay for display in the segmentStatistics Module and later after dicom import?

{QuantityCode meaning} or {DerivationModifier meaning} if present plus {UnitCode value} plus [1][2] for duplicates?

Then I'll go ahead and implement these changes.

fedorov commented 7 years ago

@chribaue looks fine to me! We can always improve later, if we find that this does not work in certain situations.

One caveat that I just realized - we should have a backup when a plugin does not define these codes. But another way to deal with this is to force every plugin to define them, and provide instructions how to do it (they can always define private coding schemes that would be plugin-specific).

fedorov commented 7 years ago

... and all of the plugins that are right now in SegmentStatistics have those codes defined, so this is not of immediate concern.

chribaue commented 7 years ago

I found one measurement item where our outlined recipe to derive a meaningful column header name causes issues:

For "voxel_count" {QuantityCode meaning} or {DerivationModifier meaning} if present plus {UnitCode value} results in "Number of voxels voxels"...

fedorov commented 7 years ago

perhaps a comma preceding units could help, or we could not include units in the column name at all?

chribaue commented 7 years ago

maybe we could could put the units in brackets? Number of voxels (voxels) Volume (mm3) [1] Volume (mm3) [2] ...

fedorov commented 7 years ago

maybe we could could put the units in brackets?

that is fine with me too

lassoan commented 7 years ago

Volume [mm3] (1)

Would be even better, as on Windows, duplicate file names are resolved by adding (1), (2), ... Y default. Also, I think [] is used more commonly than () for units.

chribaue commented 7 years ago

Volume [mm3] (1)

okay

chribaue commented 7 years ago

I made the changes regarding the column header we've discussed. As a note: vtkMRMLTableNode allows storing arbitrary properties as name/value pair with each column, however, what is displayed in the tooltip is currently hard coded in Slicer. Therefore, I added the name of the Calculator plugin to the description field of the column.

screenshot from 2017-08-14 14 50 50

fedorov commented 7 years ago

I added the name of the Calculator plugin to the description field of the column.

small nitpick - how about instead of "Labelmap: Volume mm3" use "Volume [mm3], Labelmap plugin", or something similar that would convey more explicitly that "Labelmap" is the name of the calculator?

chribaue commented 7 years ago

screenshot from 2017-08-14 16 03 39

lassoan commented 7 years ago

Meaning of "Labelmap calculator" is not obvious. It would be better to match the displayed plugin name exactly: "Labelmap Statistics". Full text could be something like this: "Volume [mm3], computed by Labelmap Statistics plugin" or "Volume [mm3], computed by Segment Statistics / Labelmap Statistics method" I think html formatting should work, too, so it can be multiple lines: "Volume [mm3]
Computed by Segment Statistics / Labelmap Statistics method."

chribaue commented 7 years ago

screenshot from 2017-08-15 08 35 39

fedorov commented 7 years ago

@chribaue as discussed today at the Slicer call, we are switching to preparing a new Slicer release. I think the improvements to SegmentStatistics are a in a good shape, and will be very useful for our MICCAI tutorial, so we should try to get them into the release. Can you finalize and make a PR?

cc @jcfr

chribaue commented 7 years ago

Can you finalize and make a PR?

I'll test it a last time on another machine with another OS and then make a PR

chribaue commented 7 years ago

I tested on Mac with the latest nightly build. The SegmentStatistics Module works as expected.

But, I loaded a CT scan from DICOM so that the attribute VoxelValueUnits is set for the VolumeNode. SegmenStatistics displays for the units for e.g. the Mean measurement with double square brackets [[hnsf'U]], because the VoxelValueUnits DICOM triplet the value is set as [hnsf'U] which should probably be hnsf'U.

fedorov commented 7 years ago

No, [hnsf'U], with brackets, is the correct UCUM code for Hounsfield units.

We talked about this in the past, and we discussed establishing a mapping to more widely used abbreviations for some of the codes, and HU specifically. Maybe we can add this item to the future refinements list? We would need to have a location for such mapping to apply it both at the time measurements are calculated, and also when they are read from SR.

chribaue commented 7 years ago

So, is the way the header names are -- with double brackets -- then fine for now? Or should we change anything now before I make a PR?

fedorov commented 7 years ago

From my point of view, it is not a huge deal, but perhaps another idea is to add comma before units, and remove the square brackets. But if it stays as is it is also fine with me. I think it is more important to get it into the release, and leave time to refine any possible issues that might come up in QuantitativeReporting after integration.

lassoan commented 7 years ago

UCUM code value is designed to be unambiguous and not intended to be displayed for users.

DICOM allows the code meaning to be set from a number of different sources (http://dicom.nema.org/medical/dicom/current/output/chtml/part16/sect_7.2.2.html):

the "print" value specified in UCUM (e.g., "mmHg" for Code Value mm[Hg])
the same string as sent in the Code Value (e.g., "ml/s")
constructed from the "names" of individual components using the Americanized form of name (e.g., "milliliters/second")
constructed from the "names" of individual components using the European form of name (e.g., " millilitres /second")

For Hounsfield unit, print value is "HF" (http://unitsofmeasure.org/ucum.html), which would generate a much more user-friendly [HF].

Overall, I think it would be a better choice to show code meaning (which is intended to be human-readable, can be translated, etc) to users in the table column header. We could set the code meaning as we see best fit for our users. In the future we could make it configurable in application settings (similarly to how preferences on time&date display format is different in each country). We would still have the exact code in the schema.csv file for unambiguous machine processing.

fedorov commented 7 years ago

@lassoan makes sense, but issues may come up when we read documents created using other tools. They may follow different conventions for defining CodeMeaning, and this can be confusing to the users. I think it might be more consistent to show CodeValue, and in the corner cases, like HU, define a centralized mapping table. What do you think?

lassoan commented 7 years ago

I've found this note in the UCUM standard: "If unit symbols for the purpose of display and print are derived from The Unified Code for Units of Measure units, the square brackets can be removed. However, display units are out of scope of The Unified Code for Units of Measure. " (http://unitsofmeasure.org/ucum.html)

So, for now we can just choose to get rid of the square brackets and have "Minimum [hnsf'U]" and implement a nicer solution later.

chribaue commented 7 years ago

for the purpose of display ...the square brackets can be removed

This should give reasonable results. I will make this change in the source code.

fedorov commented 7 years ago

Sounds good to me!

QIICR / QuantitativeReporting

Integrate with updated SegmentStatistics #168

The tooltip should display the following:

Right now when creating a report, I am getting inconsistencies within the table regarding the tooltip showing human readable "Hounsfield unit" but "cc" instead of "cubic centimeter":