SMI / IsIdentifiable

A tool for detecting identifiable information in data sources (CSV, DICOM, Relational Database and MongoDB)
GNU General Public License v3.0
14 stars 3 forks source link

Run OCR on all image frames, overlays and overlay frames #58

Open howff opened 2 years ago

howff commented 2 years ago

At the moment IsIdentifiable only looks at the first frame of the image: https://github.com/SMI/IsIdentifiable/blob/169f2da8144086228f20a3f9147bf5ccc792f988/IsIdentifiable/Runners/DicomFileRunner.cs#L236 because frame number is implicitly zero: https://fo-dicom.github.io/html/db4d5ade-f708-0eb5-1e81-39b256fb2822.htm

Images can contain multiple frames: https://dicom.innolitics.com/ciods/us-multi-frame-image/multi-frame/00280008

Additionally, images can contain an overlay, "hidden" as the top ("unused") bits in the PixelData: https://dicom.innolitics.com/ciods/us-image/overlay-plane/60xx0102

Additionally, images can contain up to 16 overlays: https://dicom.innolitics.com/ciods/us-image/overlay-plane/60xx3000 where 60xx could be 6000, 6002, up to 601E.

Additionally, each overlay can consist of multiple frames: https://dicom.innolitics.com/ciods/nm-image/multi-frame-overlay/60xx0015

tznind commented 2 years ago

DICOM standard allows two specific types of overlays (graphics and ROI) along with the image and overlays are stored as 1-bit image in Overlay Data (60XX, 0050) attribute. A dataset can have up to 16 separate overpay planes (using the repeating groups encoding).

The overlay plane that represents region of interest (ROI) will have value of “R” for Overlay Type (60xx, 0040) attribute and ROI Area (60xx, 1301), ROI Mean (60xx,1302) and ROI Standard Deviation (60xx, 1303) can be used for the corresponding values of ROI. All bits representing ROI will have a value of 1 that represents the pixels under the boundaries of the actual image data.

Graphic Overlay will have value of “G” in Overlay Type (60xx, 0040) attribute and it is used for expressing reference marks (reference line), graphic annotation, or bitmap text etc. Again, all visible values in an overlay plane are set to 1.

The Overlay Rows (60xx, 0010) and Overlay Columns (60xx,0011) specifies the width and height of the overlay plane. Overlay Bits Allocated is always 1 and Overlay Bit Position is 0 (it was used in previous version and usage has been retired). Overlay Origin (60xx, 0050) is used to described the first overlay point with respect to the pixel in the image and 1\1 represents upper left pixel of the image.

https://stackoverflow.com/a/36727987

rkm commented 2 years ago

Another ref we may find useful: https://cdn.ymaws.com/siim.org/resource/resmgr/mimi18/presentations/18cmimi_ml-clunie.pdf

howff commented 2 days ago

This may well already be implemented, but it does require proper tests