cati-neuroimaging / deidentification

Tool to remove metadata allowing to identify a subject from DICOM images used in neuroimaging research
MIT License
3 stars 2 forks source link

Improve UID generation #42

Closed DimitriPapadopoulos closed 1 year ago

DimitriPapadopoulos commented 1 year ago

Start modifying UID generation.

Should eventually lead to fixing #25.

DimitriPapadopoulos commented 1 year ago

PS 3.5 Sect B2 is probably the most relevant here:

B.2 UUID Derived UID

[ISO/IEC 9834-8] / [ITU-T X.667] defines a method by which a UID may be constructed from the root "2.25." followed by a decimal representation of a Universally Unique Identifier (UUID). That decimal representation treats the 128 bit UUID as an integer, and may thus be up to 39 digits long (leading zeros must be suppressed).

Yet it is unclear what ITU-T X.667 recommends exactly:

6.3 Representation as a single integer value

A UUID can be represented as a single integer value. To obtain the single integer value of the UUID, the 16 octets of the binary representation shall be treated as an unsigned integer encoding with the most significant bit of the integer encoding as the most significant bit (bit 7) of the first of the sixteen octets (octet 15) and the least significant bit as the least significant bit (bit 0) of the last of the sixteen octets (octet 0).

NOTE – The single integer value is used when the UUID forms the primary integer value of a Joint UUID arc as specified in clause 7.

8 Use of a UUID to form a URN

A URN (see IETF RFC 2141) formed using a UUID shall be the string "urn:uuid:" followed by the hexadecimal representation of a UUID defined in 6.4.

EXAMPLE – The following is an example of the string representation of a UUID as a URN:

urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6

NOTE – An alternative URN format (see see IETF RFC 3061) is available, but is not recommended for URNs generated using UUIDs. This alternative format uses the single integer value of the UUID specified in 6.3, and represents the above example as "urn:oid:2.25.329800735698586629295641978511506172918".

See also:

Typically we could write:

>>> from uuid import uuid4
>>> 
>>> u = uuid4()
>>> 
>>> uid = "2.25."  + str(u.int)
>>>