QIICR / ProjectIssuesAndWiki

Project to keep track of overall progress and milestones of QIICR
https://github.com/QIICR/ProjectIssuesAndWiki/wiki
9 stars 3 forks source link

Slicer DICOM UID org root #16

Closed fedorov closed 10 years ago

fedorov commented 10 years ago

As we plan to create DICOM objects in Slicer, should we consider using a proper UID org root? Or should is there one in DCMTK that we should use?

Steve, you mentioned there was an effort in the past to get this UID root for Slicer - should we follow up on that effort?

pieper commented 10 years ago

At one point I got an SPL root as a subpart of the BWH space but I didn't end up using it. We could probably dig it up or get a new one. We should discuss the nature of what these roots mean and how we want to treat them. Ideally the operator of the software (the institution) should provide the org root, not the software itself. For example, we would not want people to generate documents that claimed to be from BWH.

As I understand it the spec calls for the UID to be strictly globally unique (which is sometimes possible, but actually typically means 'probably unique'), with some pseudo provenance property about the organization (hospital) that generated it. This is unlike git hashes, which are essentially random numbers that very very unlikely to be non-unique and contain no other information. So we need to take care in how we handle UIDs to live up to the spirit of the spec. Also we should check for clarification in the spec itself, since it's been many years since I read it in detail (probably the 'non-organizational' approach is a well established convention by now). Slicer should at least allow people to do it right if they want to put in the effort, but slicer should still generate valid documents with a default configuration.

Right now slicer uses the gdcm generator, which relies on a default root and the MAC address of the network adapter (which has caused some practical issues, since gdcm printed a message to stderr on linux machines when in airplane mode and that screwed up some CLI processing, but more importantly probably meant it could not generate a valid UID without network adapter). DCMTK has a documented algorithm about how the UIDs are generated and it looks robust.

http://www.medicalconnections.co.uk/kb/UID_Rules

http://stackoverflow.com/questions/10295792/how-to-generate-sopinstance-uid-for-dicom-file

http://en.wikipedia.org/wiki/Universally_unique_identifier

http://forum.dcmtk.org/viewtopic.php?f=4&t=910&sid=dcbebbe5e89764b36e0dc76540347b4d

http://support.dcmtk.org/docs-dcmrt/classOFUUID.html

dclunie commented 10 years ago

Hi guys

I doubt there is a need for a Slicer-specific root.

Marco also commented on dcmtk's approach again here:

http://forum.dcmtk.org/viewtopic.php?t=783

Note the dependence on a timestamp and machine ID.

I usually use a similar mechanism.

If we are going to use dcmtk for the project, I would stick with the mechanism they use rather than considering other alternatives, unless there is significant concern about it.

The matter of being able to generate the same UID repeatably is primarily a concern during testing ... e.g., if one has an output file from a previous run and wants to compare them, it is easier if the UIDs don't change. On the other hand, if a tool runs in production then the same UID should never be reissued (unless you can guarantee that the UID refers to exactly the same content). Sometimes this is desirable, e.g., round trip conversions from one form to another like single to multi-frame, for example, but this is usually achieved by recording what the previous UIDs were.

One can conceive of convoluted ways to manipulate UID generation such they are repeatable during specific tests but not in production, or filters on test results that exclude UID differences (which eliminates binary file comparison).

An alternative approach that is also not deterministic (at least to the extent that you cannot generate the same UUID twice) is just to use UUIDs that are converted to UIDs (and thus depend on the reliability of whatever UUID source you have access to).

Steve already included a link to the Wikipedia description of UUIDs.

See DICOM PS 3.5 Annex B.2, or the CP that described this:

http://www.dclunie.com/dicom-status/status.html#CP1156

Also:

http://www.dclunie.com/medical-image-faq/html/part2.html#UUID

E.g., the UUID

f81d4fae-7dec-11d0-a765-00a0c91e6bf6

becomes the DICOM UID

2.25.329800735698586629295641978511506172918

Also, see:

http://www.dclunie.com/pixelmed/software/javadoc/com/pixelmed/utils/UUIDBasedOID.html

David

On 12/12/13 7:57 AM, Steve Pieper wrote:

At one point I got an SPL root as a subpart of the BWH space but I didn't end up using it. We could probably dig it up or get a new one. We should discuss the nature of what these roots mean and how we want to treat them. Ideally the operator of the software (the institution) should provide the org root, not the software itself. For example, we would not want people to generate documents that claimed to be from BWH.

As I understand it the spec calls for the UID to be strictly globally unique (which is sometimes possible, but actually typically means 'probably unique'), with some pseudo provenance property about the organization (hospital) that generated it. This is unlike git hashes, which are essentially random numbers that very very unlikely to be non-unique and contain no other information. So we need to take care in how we handle UIDs to live up to the spirit of the spec. Also we should check for clarification in the spec itself, since it's been many years since I read it in detail (probably the 'non-organizational' approach is a well established convention by now). Slicer should at least allow people to do it right if they want to put in the effort, but slicer should still generate valid documents with a default configuration.

Right now slicer uses the gdcm generator, which relies on a default root and the MAC address of the network adapter (which has caused some practical issues, since gdcm printed a message to stderr on linux machines when in airplane mode and that screwed up some CLI processing, but more importantly probably meant it could not generate a valid UID without network adapter). DCMTK has a documented algorithm about how the UIDs are generated and it looks robust.

http://www.medicalconnections.co.uk/kb/UID_Rules

http://stackoverflow.com/questions/10295792/how-to-generate-sopinstance-uid-for-dicom-file

http://en.wikipedia.org/wiki/Universally_unique_identifier

http://forum.dcmtk.org/viewtopic.php?f=4&t=910&sid=dcbebbe5e89764b36e0dc76540347b4d

http://support.dcmtk.org/docs-dcmrt/classOFUUID.html


Reply to this email directly or view it on GitHub: https://github.com/QIICR/ProjectIssuesAndWiki/issues/16#issuecomment-30418215

pieper commented 10 years ago

Thanks for the clarification and info David!

Personally, I like the 2.25 approach since it treats everything the same and everyone agrees that 'highly highly low probability of collisions' is good enough for all practical purposes. But I'm also fine with using the dcmtk default. If we have time or motivation to work on it, I think the provenance of the software/hardware system that created the object should be more explicitly identified and the prefix of the UID shouldn't carry any particular meaning. Ideally if data is meant to be trusted it should be signed rather than relying on a property of the UID. For testing we should probably ignore the UID when comparing results from different runs.

-Steve

On Thu, Dec 12, 2013 at 9:30 AM, dclunie notifications@github.com wrote:

Hi guys

I doubt there is a need for a Slicer-specific root.

Marco also commented on dcmtk's approach again here:

http://forum.dcmtk.org/viewtopic.php?t=783

Note the dependence on a timestamp and machine ID.

I usually use a similar mechanism.

If we are going to use dcmtk for the project, I would stick with the mechanism they use rather than considering other alternatives, unless there is significant concern about it.

The matter of being able to generate the same UID repeatably is primarily a concern during testing ... e.g., if one has an output file from a previous run and wants to compare them, it is easier if the UIDs don't change. On the other hand, if a tool runs in production then the same UID should never be reissued (unless you can guarantee that the UID refers to exactly the same content). Sometimes this is desirable, e.g., round trip conversions from one form to another like single to multi-frame, for example, but this is usually achieved by recording what the previous UIDs were.

One can conceive of convoluted ways to manipulate UID generation such they are repeatable during specific tests but not in production, or filters on test results that exclude UID differences (which eliminates binary file comparison).

An alternative approach that is also not deterministic (at least to the extent that you cannot generate the same UUID twice) is just to use UUIDs that are converted to UIDs (and thus depend on the reliability of whatever UUID source you have access to).

Steve already included a link to the Wikipedia description of UUIDs.

See DICOM PS 3.5 Annex B.2, or the CP that described this:

http://www.dclunie.com/dicom-status/status.html#CP1156

Also:

http://www.dclunie.com/medical-image-faq/html/part2.html#UUID

E.g., the UUID

f81d4fae-7dec-11d0-a765-00a0c91e6bf6

becomes the DICOM UID

2.25.329800735698586629295641978511506172918

Also, see:

http://www.dclunie.com/pixelmed/software/javadoc/com/pixelmed/utils/UUIDBasedOID.html

David

On 12/12/13 7:57 AM, Steve Pieper wrote:

At one point I got an SPL root as a subpart of the BWH space but I didn't end up using it. We could probably dig it up or get a new one. We should discuss the nature of what these roots mean and how we want to treat them. Ideally the operator of the software (the institution) should provide the org root, not the software itself. For example, we would not want people to generate documents that claimed to be from BWH.

As I understand it the spec calls for the UID to be strictly globally unique (which is sometimes possible, but actually typically means 'probably unique'), with some pseudo provenance property about the organization (hospital) that generated it. This is unlike git hashes, which are essentially random numbers that very very unlikely to be non-unique and contain no other information. So we need to take care in how we handle UIDs to live up to the spirit of the spec. Also we should check for clarification in the spec itself, since it's been many years since I read it in detail (probably the 'non-organizational' approach is a well established convention by now). Slicer should at least allow people to do it right if they want to put in the effort, but slicer should still generate valid documents with a default configuration.

Right now slicer uses the gdcm generator, which relies on a default root and the MAC address of the network adapter (which has caused some practical issues, since gdcm printed a message to stderr on linux machines when in airplane mode and that screwed up some CLI processing, but more importantly probably meant it could not generate a valid UID without network adapter). DCMTK has a documented algorithm about how the UIDs are generated and it looks robust.

http://www.medicalconnections.co.uk/kb/UID_Rules

http://stackoverflow.com/questions/10295792/how-to-generate-sopinstance-uid-for-dicom-file

http://en.wikipedia.org/wiki/Universally_unique_identifier

http://forum.dcmtk.org/viewtopic.php?f=4&t=910&sid=dcbebbe5e89764b36e0dc76540347b4d

http://support.dcmtk.org/docs-dcmrt/classOFUUID.html


Reply to this email directly or view it on GitHub:

https://github.com/QIICR/ProjectIssuesAndWiki/issues/16#issuecomment-30418215

— Reply to this email directly or view it on GitHubhttps://github.com/QIICR/ProjectIssuesAndWiki/issues/16#issuecomment-30426435 .

The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.

dclunie commented 10 years ago

Hi Steve

Provenance should NEVER be assumed from UIDs (both because you are not supposed to parse them, and because some systems will mess with them during ingestion for one reason or another, especially in a research/clinical trial context that requires de-identification).

There are a whole bunch of attributes specifically designed to encode provenance, assuming that the original Manufacturer, Manufacturer's Model Name, Device Serial Number and Software Versions in the top level data set are not sufficient (or are left alone/copied from the values supplied by the scanner).

Specifically, the Contributing Equipment Sequence is designed for this. See PS 3.3 C.12.1 SOP Common Module. To summarize it is multi-valued (multiple items) and includes:

Contributing Equipment Sequence

Purpose of Reference Code Sequence

Include ‘Code Sequence Macro’ Table 8.8-1 Manufacturer Institution Name Institution Address Station Name Institutional Department Name Operators' Name Operator Identification Sequence Include ‘Person Identification Macro’ Table 10-1 Manufacturer’s Model Name Device Serial Number Software Versions Spatial Resolution Date of Last Calibration Time of Last Calibration Contribution DateTime Contribution Description

For SR objects, additional mechanisms are available in various templates that describe "observer context".

I think we should avoid the matter of electronic or digital signatures. Cryptographic digital signatures are defined in DICOM, and are supported in dcmtk, but raise all sorts of issues and are never used in practice.

David

On 12/12/13 9:58 AM, Steve Pieper wrote:

Thanks for the clarification and info David!

Personally, I like the 2.25 approach since it treats everything the same and everyone agrees that 'highly highly low probability of collisions' is good enough for all practical purposes. But I'm also fine with using the dcmtk default. If we have time or motivation to work on it, I think the provenance of the software/hardware system that created the object should be more explicitly identified and the prefix of the UID shouldn't carry any particular meaning. Ideally if data is meant to be trusted it should be signed rather than relying on a property of the UID. For testing we should probably ignore the UID when comparing results from different runs.

-Steve

On Thu, Dec 12, 2013 at 9:30 AM, dclunie notifications@github.com wrote:

Hi guys

I doubt there is a need for a Slicer-specific root.

Marco also commented on dcmtk's approach again here:

http://forum.dcmtk.org/viewtopic.php?t=783

Note the dependence on a timestamp and machine ID.

I usually use a similar mechanism.

If we are going to use dcmtk for the project, I would stick with the mechanism they use rather than considering other alternatives, unless there is significant concern about it.

The matter of being able to generate the same UID repeatably is primarily a concern during testing ... e.g., if one has an output file from a previous run and wants to compare them, it is easier if the UIDs don't change. On the other hand, if a tool runs in production then the same UID should never be reissued (unless you can guarantee that the UID refers to exactly the same content). Sometimes this is desirable, e.g., round trip conversions from one form to another like single to multi-frame, for example, but this is usually achieved by recording what the previous UIDs were.

One can conceive of convoluted ways to manipulate UID generation such they are repeatable during specific tests but not in production, or filters on test results that exclude UID differences (which eliminates binary file comparison).

An alternative approach that is also not deterministic (at least to the extent that you cannot generate the same UUID twice) is just to use UUIDs that are converted to UIDs (and thus depend on the reliability of whatever UUID source you have access to).

Steve already included a link to the Wikipedia description of UUIDs.

See DICOM PS 3.5 Annex B.2, or the CP that described this:

http://www.dclunie.com/dicom-status/status.html#CP1156

Also:

http://www.dclunie.com/medical-image-faq/html/part2.html#UUID

E.g., the UUID

f81d4fae-7dec-11d0-a765-00a0c91e6bf6

becomes the DICOM UID

2.25.329800735698586629295641978511506172918

Also, see:

http://www.dclunie.com/pixelmed/software/javadoc/com/pixelmed/utils/UUIDBasedOID.html

David

On 12/12/13 7:57 AM, Steve Pieper wrote:

At one point I got an SPL root as a subpart of the BWH space but I didn't end up using it. We could probably dig it up or get a new one. We should discuss the nature of what these roots mean and how we want to treat them. Ideally the operator of the software (the institution) should provide the org root, not the software itself. For example, we would not want people to generate documents that claimed to be from BWH.

As I understand it the spec calls for the UID to be strictly globally unique (which is sometimes possible, but actually typically means 'probably unique'), with some pseudo provenance property about the organization (hospital) that generated it. This is unlike git hashes, which are essentially random numbers that very very unlikely to be non-unique and contain no other information. So we need to take care in how we handle UIDs to live up to the spirit of the spec. Also we should check for clarification in the spec itself, since it's been many years since I read it in detail (probably the 'non-organizational' approach is a well established convention by now). Slicer should at least allow people to do it right if they want to put in the effort, but slicer should still generate valid documents with a default configuration.

Right now slicer uses the gdcm generator, which relies on a default root and the MAC address of the network adapter (which has caused some practical issues, since gdcm printed a message to stderr on linux machines when in airplane mode and that screwed up some CLI processing, but more importantly probably meant it could not generate a valid UID without network adapter). DCMTK has a documented algorithm about how the UIDs are generated and it looks robust.

http://www.medicalconnections.co.uk/kb/UID_Rules

http://stackoverflow.com/questions/10295792/how-to-generate-sopinstance-uid-for-dicom-file

http://en.wikipedia.org/wiki/Universally_unique_identifier

http://forum.dcmtk.org/viewtopic.php?f=4&t=910&sid=dcbebbe5e89764b36e0dc76540347b4d

http://support.dcmtk.org/docs-dcmrt/classOFUUID.html


Reply to this email directly or view it on GitHub:

https://github.com/QIICR/ProjectIssuesAndWiki/issues/16#issuecomment-30418215

— Reply to this email directly or view it on GitHubhttps://github.com/QIICR/ProjectIssuesAndWiki/issues/16#issuecomment-30426435 .

The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.


Reply to this email directly or view it on GitHub: https://github.com/QIICR/ProjectIssuesAndWiki/issues/16#issuecomment-30428888

pieper commented 10 years ago

Right - we're on the same wavelength here - that's why I prefer the 2.5.UUID version of the UID since it explicitly removes the temptation to interpret anything about the UID prefix.

And yes, we want to really get the provenance concepts deeply integrated into our plans.

On Thu, Dec 12, 2013 at 10:26 AM, dclunie notifications@github.com wrote:

Hi Steve

Provenance should NEVER be assumed from UIDs (both because you are not supposed to parse them, and because some systems will mess with them during ingestion for one reason or another, especially in a research/clinical trial context that requires de-identification).

There are a whole bunch of attributes specifically designed to encode provenance, assuming that the original Manufacturer, Manufacturer's Model Name, Device Serial Number and Software Versions in the top level data set are not sufficient (or are left alone/copied from the values supplied by the scanner).

Specifically, the Contributing Equipment Sequence is designed for this. See PS 3.3 C.12.1 SOP Common Module. To summarize it is multi-valued (multiple items) and includes:

Contributing Equipment Sequence

Purpose of Reference Code Sequence

Include ‘Code Sequence Macro’ Table 8.8-1 Manufacturer Institution Name Institution Address Station Name Institutional Department Name Operators' Name Operator Identification Sequence Include ‘Person Identification Macro’ Table 10-1 Manufacturer’s Model Name Device Serial Number Software Versions Spatial Resolution Date of Last Calibration Time of Last Calibration Contribution DateTime Contribution Description

For SR objects, additional mechanisms are available in various templates that describe "observer context".

I think we should avoid the matter of electronic or digital signatures. Cryptographic digital signatures are defined in DICOM, and are supported in dcmtk, but raise all sorts of issues and are never used in practice.

David

On 12/12/13 9:58 AM, Steve Pieper wrote:

Thanks for the clarification and info David!

Personally, I like the 2.25 approach since it treats everything the same and everyone agrees that 'highly highly low probability of collisions' is good enough for all practical purposes. But I'm also fine with using the dcmtk default. If we have time or motivation to work on it, I think the provenance of the software/hardware system that created the object should be more explicitly identified and the prefix of the UID shouldn't carry any particular meaning. Ideally if data is meant to be trusted it should be signed rather than relying on a property of the UID. For testing we should probably ignore the UID when comparing results from different runs.

-Steve

On Thu, Dec 12, 2013 at 9:30 AM, dclunie notifications@github.com wrote:

Hi guys

I doubt there is a need for a Slicer-specific root.

Marco also commented on dcmtk's approach again here:

http://forum.dcmtk.org/viewtopic.php?t=783

Note the dependence on a timestamp and machine ID.

I usually use a similar mechanism.

If we are going to use dcmtk for the project, I would stick with the mechanism they use rather than considering other alternatives, unless there is significant concern about it.

The matter of being able to generate the same UID repeatably is primarily a concern during testing ... e.g., if one has an output file from a previous run and wants to compare them, it is easier if the UIDs don't change. On the other hand, if a tool runs in production then the same UID should never be reissued (unless you can guarantee that the UID refers to exactly the same content). Sometimes this is desirable, e.g., round trip conversions from one form to another like single to multi-frame, for example, but this is usually achieved by recording what the previous UIDs were.

One can conceive of convoluted ways to manipulate UID generation such they are repeatable during specific tests but not in production, or filters on test results that exclude UID differences (which eliminates binary file comparison).

An alternative approach that is also not deterministic (at least to the extent that you cannot generate the same UUID twice) is just to use UUIDs that are converted to UIDs (and thus depend on the reliability of whatever UUID source you have access to).

Steve already included a link to the Wikipedia description of UUIDs.

See DICOM PS 3.5 Annex B.2, or the CP that described this:

http://www.dclunie.com/dicom-status/status.html#CP1156

Also:

http://www.dclunie.com/medical-image-faq/html/part2.html#UUID

E.g., the UUID

f81d4fae-7dec-11d0-a765-00a0c91e6bf6

becomes the DICOM UID

2.25.329800735698586629295641978511506172918

Also, see:

http://www.dclunie.com/pixelmed/software/javadoc/com/pixelmed/utils/UUIDBasedOID.html

David

On 12/12/13 7:57 AM, Steve Pieper wrote:

At one point I got an SPL root as a subpart of the BWH space but I didn't end up using it. We could probably dig it up or get a new one. We should discuss the nature of what these roots mean and how we want to treat them. Ideally the operator of the software (the institution) should provide the org root, not the software itself. For example, we would not want people to generate documents that claimed to be from BWH.

As I understand it the spec calls for the UID to be strictly globally unique (which is sometimes possible, but actually typically means 'probably unique'), with some pseudo provenance property about the organization (hospital) that generated it. This is unlike git hashes, which are essentially random numbers that very very unlikely to be non-unique and contain no other information. So we need to take care in how we handle UIDs to live up to the spirit of the spec. Also we should check for clarification in the spec itself, since it's been many years since I read it in detail (probably the 'non-organizational' approach is a well established convention by now). Slicer should at least allow people to do it right if they want to put in the effort, but slicer should still generate valid documents with a default configuration.

Right now slicer uses the gdcm generator, which relies on a default root and the MAC address of the network adapter (which has caused some practical issues, since gdcm printed a message to stderr on linux machines when in airplane mode and that screwed up some CLI processing, but more importantly probably meant it could not generate a valid UID without network adapter). DCMTK has a documented algorithm about how the UIDs are generated and it looks robust.

http://www.medicalconnections.co.uk/kb/UID_Rules

http://stackoverflow.com/questions/10295792/how-to-generate-sopinstance-uid-for-dicom-file

http://en.wikipedia.org/wiki/Universally_unique_identifier

http://forum.dcmtk.org/viewtopic.php?f=4&t=910&sid=dcbebbe5e89764b36e0dc76540347b4d

http://support.dcmtk.org/docs-dcmrt/classOFUUID.html


Reply to this email directly or view it on GitHub:

https://github.com/QIICR/ProjectIssuesAndWiki/issues/16#issuecomment-30418215

— Reply to this email directly or view it on GitHub< https://github.com/QIICR/ProjectIssuesAndWiki/issues/16#issuecomment-30426435

.

The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.


Reply to this email directly or view it on GitHub:

https://github.com/QIICR/ProjectIssuesAndWiki/issues/16#issuecomment-30428888

— Reply to this email directly or view it on GitHubhttps://github.com/QIICR/ProjectIssuesAndWiki/issues/16#issuecomment-30431557 .

fedorov commented 10 years ago

Thank you for this lively discussion! I am closing this issue, as it clearly appears we do not need to get a special UID root, which is a good news, and can use DCMTK mechanisms and/or "2.25" approaches for UID generation.

michaelonken commented 10 years ago

By the way, DCMTK also permits creation of UIDs based on the 2.25 approach, see ofuuid.h .

fedorov commented 10 years ago

It was clarified by @dclunie at the today's call that the 2.25 UID generation approach is indeed part of the standard.