Fix loading confidence values

lassoan commented 5 years ago

Commit https://github.com/fedorov/MultiVolumeImporter/commit/1f7b395566cdfa09aecf68a429afff431657d57c has introduced inconsistent confidence values. This pull request addresses the most important regression, but there are some additional issues to discuss:

The most important problem is that the main mechanism for grouping frames (examineFiles) was degraded to confidence = 0.5 !!! This should have the highest confidence, as it means that a single series is detected as a multi-volume by a known FrameIdentifyingDICOMTag.

In contrast, examineFilesMultiseries is based on a much weaker hypothesis (it lumps together all series in the selection), sets confidence value of 0.9 or 1.0.

DICOM plugins in Slicer core should never use confidence value of 1.0 (=absolute certainty) as it prevents more specific plugins in extensions to get highest priority. You choose very high values (0.9, 0.95, 0.99), but never 1.0.
Commit comment of https://github.com/fedorov/MultiVolumeImporter/commit/1f7b395566cdfa09aecf68a429afff431657d57c does not match what is done in the commit. Which one is correct? The comment or the code change?

This commit only addresses issue (1).

fedorov commented 5 years ago

Interesting. Thank you for looking and commenting on this issue. I cannot make time to work on it right now, but I appreciate your PR and will look at it a bit later. If you feel it is urgent and needs to be addressed today, please let me know!

lassoan commented 5 years ago

Thank you, I really appreciate the fast response. We forked the repository for the project where we need this urgently, so it is fine if you need a couple of days to review this.

fedorov commented 5 years ago

@lassoan the commit you mention does not introduce confidence of 0.5, probably you meant 0.9.

The most important problem is that the main mechanism for grouping frames (examineFiles) was degraded In contrast, examineFilesMultiseries is based on a much weaker hypothesis (it lumps together all series in the selection)

I think, and unfortunately, the definition of "the main mechanism" will depend on the specific combination of equipment/data type/etc. examineFilesMultiseries is there because some manufacturers (and for specific sample datasets) put individual time frames of a 4d acquisition into separate series.

The intent of the commit you mention, if I remember correctly, was to downgrade in significance TemporalPositionIdentifier sort tag, since it was non-specific for the dataset reported by a user, and prevented the results of more specific strategies to be loaded. When users do not toggle Advanced mode, and do not know what to expect, they just load the loadable which is on top.

DICOM plugins in Slicer core should never use confidence value of 1.0 (=absolute certainty) as it prevents more specific plugins in extensions to get highest priority.

I didn't know that, I should change it then.

Commit comment of 1f7b395 does not match what is done in the commit. Which one is correct? The comment or the code change?

I don't know which one is correct ... looks like the commit message would address your second concern, as it would just reduce all confidences by 0.1, but would not change the order of mv loadables by confidence.

fedorov commented 5 years ago

That discussion aside, I am not sure I understand how your commit addresses concern 1 (and its description is not consistent with what the code does). The earlier commit downgraded TemporalPositionIdentifier, but just for the multi-series parse strategy. Your current commit does not change that, but downgrades TemporalPositionIdentifier for the main strategy.

Looking at the code again, I do not see how this condition would be satisfied ever:

https://github.com/fedorov/MultiVolumeImporter/blob/1f7b395566cdfa09aecf68a429afff431657d57c/MultiVolumeImporterPlugin.py#L193-L194

because TemporalPositionIdentifier is not listed in the prescribedTags here: https://github.com/fedorov/MultiVolumeImporter/blob/1f7b395566cdfa09aecf68a429afff431657d57c/MultiVolumeImporterPlugin.py#L168

I have not looked into this code in a while, and not enough time to finish this with the time I have left...

lassoan commented 5 years ago

Maybe in your commit https://github.com/fedorov/MultiVolumeImporter/commit/1f7b395566cdfa09aecf68a429afff431657d57c line 509 was removed unintentionally.

Your current commit does not change that, but downgrades TemporalPositionIdentifier for the main strategy.

Removing setting of confidence means that you set the default (0.5) confidence value. Therefore, setting confidence value to 0.9/1.0 is an upgrade, not a downgrade. I've restored confidence value setting using the same logic that you introduced in examineFilesMultiseries (set it to 0.9/1.0 depending on tagName == 'TemporalPositionIdentifier'). I don't mind tuning the confidence value depending on 'TemporalPositionIdentifier'. I've just used the same logic in for consistency. If you don't think it's needed, we can simply set it to 1.0.

As we support more and more scanners/models/imaging protocols, there is an increased risk that a modification or new feature breaks existing function. We have not added any automatic tests for this importer because these 4D files are big and often we are not allowed to share them. What do you think about creating simplified DICOM files from scratch, with minimal content (using dummy image data of very small size, have only that many tag that allows DICOM database import and multi-volume loading) and creating automatic tests using these?

fedorov commented 5 years ago

Oh, I see - I didn't know/forgot about the default 0.5!

Yes, definitely would be very appropriate to add testing and include realistic datasets. For PKModeling we have been working around the image size by cropping small sections of a real DCE multivolume. Perhaps we could explore something similar with the original DICOMs by cropping PixelData and updating Rows/Columns accordingly. This might be easier than composing the datasets from scratch, but I have not tested how that would work in practice. It would also be better if we had as valid as possible DICOM datasets, even if they are "phantoms". Let's get back to this after RSNA?

lassoan commented 5 years ago

Yes, either building up a simplified data set or dumbing down an existing one would work (and neither of them is very easy).

Let's get back to this after RSNA?

Sounds good. We can also chat there, if we happen to have time.

pieper commented 5 years ago

Let's also talk with some of the vendors at RSNA about hosting repositories of testing data. I have talked informally with some of them in the past, and there is some interest, but now may be the time when we can make some concrete progress.

jcfr commented 5 years ago

vendors at RSNA about hosting repositories of testing data

Are you thinking about (1) having a common repository or (2) having each vendor be responsible to setup a public server ?

Ideally, vendors would be using data pushed on that server (or up-to-date mirror) for their own testing, this would help to keep the dataset current.

That said, since each one of these vendors are (most likely) competing, I am curious to see how this will unfold.

Cc: @thewtex

pieper commented 5 years ago

@jcfr I was thinking of creating a repository of curated testing data that could be mirrored to high-availability / high-bandwidth servers that expose a standard protocol (specifically DICOMweb).

Probably that would take the form of a git-lfs canonical collection and a list of DICOMweb servers that would be auto-populated with the contents of the repository. Then any testing or demonstration code could access the data from whatever server is convenient. Client access should be through DICOMweb so we can query metadata, access individual frames, etc.

It's true the vendors are competing, but standards for data representation and access are generally considered "pre-competitive". Anyone willing to pay the storage and bandwidth charges can have equal rights to host the data.

fedorov commented 5 years ago

I agree, let's try again with vendors. Let's brainstorm and put together a short list of potential candidates at RSNA. About competition, the only such issue I can think of is a vendor considering a comprehensive dataset illustrating various platform-specific deviations from DICOM a strategic asset in providing robust analysis/visualization tools. But I hope there are more vendors that would just appreciate a public dataset where everyone could contribute to, and which would make all vendors' products more robust.

fedorov / MultiVolumeImporter

Fix loading confidence values #35