ImagingDataCommons / libdicom

C library for reading DICOM files
https://libdicom.readthedocs.io
MIT License
15 stars 7 forks source link

Update to 1.0 #56

Closed jcupitt closed 9 months ago

jcupitt commented 1 year ago

This PR updates libdicom to 1.0.

The headline changes are:

New parser

The DICOM file parser has been split off to dicom-parse.c. It's a callback-based parser and knows about the internal structure of PixelData. This generic parser is used by dicom-file.c to load parts of DICOM files into memory, to scan files for features, and to print files to stdout.

This parse API is only internal, for now at least.

Revised Filehandle API

The various filehandle API calls can now be called any number of times, and in any order. They are all optional, so it's now possible to simply open a file and immediately call dcm_filehandle_read_frame().

dcm_filehandle_read_file_meta() and dcm_filehandle_read_metadata() have been renamed as dcm_filehandle_get_file_meta() and dcm_filehandle_get_metadata(). These new functions return a const pointer to libdicom's internal copy of the parsed metadata, and should NOT be destroyed. Use clone to make a copy if you need the result to live longer.

The metadata returned by dcm_filehandle_get_metadata() is only the metadata which can be read quickly and without using much memory. To read all metadata (so including, for example, the sometimes extremely large PerFrameFunctionalGroupSequence) there's a new API call dcm_filehandle_read_metadata(). This function takes a set of stop tags. If necessary, it can be called many times to read all the file metadata.

There's a new API call dcm_filehandle_read_frame_position() which will read a frame at a certain (column, row) position. It automatically takes account of any ordering in PerFrameFunctionalGroupSequence, if present.

DcmBOT is no longer exposed ion the API, since this is now all handled automatically.

Automatic handling of byte ordering and implicit versus explicit encoding has been improved.

Revised data model

dcm_sequence_foreach() and dcm_dataset_foreach() have a client pointer, allow early termination, and track sequence index.

A new function, dcm_element_set_value(), can set the value of an element from a generic byte buffer.

A new function, dcm_element_value_to_string(), makes a formatted character string representing the value of an element. It is handy for displaying values to users in an understandable way.

Some more of dicom-dict.c is in the public API, notably DcmVRClass and associated functions. This was needed by openslide.

Revised logging

There's now dcm_log_set_level() to set the log level (rather than a global variable).

If the environment variable DCM_DEBUG is set, logging defaults to DCM_LOG_DEBUG in dcm_init(). For example:

$ DCM_DEBUG=1 dcm-getframe -o x.jpg 1.3.6.1.4.1.36533.341664110819124279227187194203724415118298.dcm 1
INFO     [Sun Jun 25 15:07:56 2023] - Read filehandle '1.3.6.1.4.1.36533.341664110819124279227187194203724415118298.dcm'
INFO     [Sun Jun 25 15:07:56 2023] - Read frame 1
DEBUG    [Sun Jun 25 15:07:56 2023] - Read frame number #1.
DEBUG    [Sun Jun 25 15:07:56 2023] - Create Data Set.
DEBUG    [Sun Jun 25 15:07:56 2023] - Read Data Element body '00020001'
DEBUG    [Sun Jun 25 15:07:56 2023] - Read Data Element body '00020002'
...

New dcm-dump

There's a new function, dcm_filehandle_print(), which prints all metadata in a file, including pixeldata. This is the function used by dcm-dump.

The output looks like eg.:

$ dcm-dump 1.3.6.1.4.1.36533.341664110819124279227187194203724415118298.dcm
===File Meta Information===
(0002,0002) FileMetaInformationVersion | OB | 2 | <binary value of 2 bytes>
(0002,0002) MediaStorageSOPClassUID | UI | 30 | 1.2.840.10008.5.1.4.1.1.77.1.6
(0002,0002) MediaStorageSOPInstanceUID | UI | 60 | 1.3.6.1.4.1.36533.341664110819124279227187194203724415118298
(0002,0002) TransferSyntaxUID | UI | 22 | 1.2.840.10008.1.2.4.50
(0002,0002) ImplementationClassUID | UI | 10 | 2.16.840.1
(0002,0002) ImplementationVersionName | SH | 12 | GT450_1_0_1
(0002,0002) SourceApplicationEntityTitle | AE | 16 | Leica ScnUtility
===Dataset===
(0008,0008) ImageType | CS | 30 | [ORIGINAL, PRIMARY, OVERVIEW, NONE]
(0008,0008) SOPClassUID | UI | 30 | 1.2.840.10008.5.1.4.1.1.77.1.6
(0008,0008) SOPInstanceUID | UI | 60 | 1.3.6.1.4.1.36533.341664110819124279227187194203724415118298
(0008,0008) StudyDate | DA | 0 | 
(0008,0008) ContentDate | DA | 8 | 20210720
(0008,0008) AcquisitionDateTime | DT | 20 | 20210720114626+0100
(0008,0008) StudyTime | TM | 0 | 
...
(2200,2200) LabelText | UT | 0 | 
(2200,2200) BarcodeValue | LT | 16 | LH20-59902_1_2_1
(5200,5200) SharedFunctionalGroupsSequence [
  ---Item #1---
  (0028,0028) PixelMeasuresSequence [
    ---Item #1---
    (0018,0018) SliceThickness | DS | 2 | 0
    (0028,0028) PixelSpacing | DS | 34 | [0.03916449099779, 0.03916449099779]
  ]
  (0048,0048) OpticalPathIdentificationSequence [
    ---Item #1---
    (0048,0048) OpticalPathIdentifier | SH | 2 | 0
  ]
]
(5200,5200) PerFrameFunctionalGroupsSequence [
  ---Item #1---
  (0020,0020) FrameContentSequence [
    ---Item #1---
    (0018,0018) FrameAcquisitionDateTime | DT | 20 | 20210720114626+0100
    (0018,0018) FrameReferenceDateTime | DT | 20 | 20210720114626+0100
    (0018,0018) FrameAcquisitionDuration | FD | 8 | 0.0242
    (0020,0020) DimensionIndexValues | UL | 8 | [1, 1]
  ]
  (0048,0048) PlanePositionSlideSequence [
    ---Item #1---
    (0040,0040) XOffsetInSlideCoordinateSystem | DS | 14 | 23.69451713562
    (0040,0040) YOffsetInSlideCoordinateSystem | DS | 16 | 58.511749267578
    (0040,0040) ZOffsetInSlideCoordinateSystem | DS | 2 | 0
    (0048,0048) ColumnPositionInTotalImagePixelMatrix | SL | 4 | 1
    (0048,0048) RowPositionInTotalImagePixelMatrix | SL | 4 | 1
  ]
]
(7fe0,7fe0) PixelData | OB | 4294967295 [
  frame 0 | 0 | 
  frame 1 | 178960 | ff d8 ff ee 00 0e 41 64 6f 62 65 00 64 00 00 00 00 ...
]

You can see it's displaying the first few bytes of the frame in hex. It knows about OW and OF and OD as well, and encapsulated and native pixeldata.

It has the very nice feature of printing as it parses, so it can display DICOM files of any size very quickly and only using a little memory.

New argument parser

The two tools now use something like getopt to parse command-line arguments and switches, making them more consistent and easier to use.

jcupitt commented 1 year ago

Sorry this thing is so huge Markus :( But I think it's done now, modulo the inevitable various bugs and typos, of course.

I would just read dicom.h and the docs, and wave the rest through with a sigh. Checking the whole PR is probably too much work for any sane person. Now it's at 1.0, future changes should be small.

I put a copy of the formatted docs here for your convenience:

http://www.rollthepotato.net/~john/libdicom/

I'll update openslide to this revised API.

jcupitt commented 1 year ago

There's an incomplete and experimental python binding here:

https://github.com/jcupitt/pydicom

pydicom is a very bad name (it's taken), we should think of something better.

jcupitt commented 1 year ago

This PR replaces #55. That old PR has some benchmarks and screenshots which might still be useful.

jcupitt commented 1 year ago

I would just read dicom.h and the docs, and wave the rest through with a sigh.

An other option would be to reopen #55 and merge that. It would make this PR look a lot smaller.

jcupitt commented 9 months ago

Looks like the windows runner is failing for some config reason. Maybe %PATH% is messed up? Hopefully they'll fix it soon.

WARNING: Found pkg-config 'C:\\Strawberry\\perl\\bin\\pkg-config.BAT' but it failed when run
fedorov commented 9 months ago

There is an error about missing Ninja (https://github.com/ImagingDataCommons/libdicom/actions/runs/6283904801/job/17064777492?pr=56#step:6:210) - maybe try adding its setup as part of the workflow? https://github.com/marketplace/actions/install-ninja-build-tool

image

bgilbert commented 9 months ago

Looks like Ninja was recently removed from the Windows images: https://github.com/actions/runner-images/issues/8348

jcupitt commented 9 months ago

Woo! I'll kick the tyres a bit, then tag it as 1.0-rc1 for downstream integration and further tests.