ImagingDataCommons / libdicom

C library for reading DICOM files
https://libdicom.readthedocs.io
MIT License
15 stars 7 forks source link

Revise dicom-data #43

Closed jcupitt closed 1 year ago

jcupitt commented 1 year ago

This (huge) PR revises dicom-data with the following (hopefully) improvements:

No strings for VR

There's an enum for the VRs and their combinations defined in dicom.h:

https://github.com/jcupitt/libdicom/blob/add-vr-enum/include/dicom/dicom.h#L181-L228

Removing the strcmp()s speeds the parser up by about 20%. The enum lets us index a table of VR properties:

https://github.com/jcupitt/libdicom/blob/add-vr-enum/src/dicom-dict.c#L48-L158

Fewer mallocs

  1. The IHeader and EHeader objects no longer exist (we use locals instead, and element_read_header())
  2. DcmElement hides VM 1 values (a very common case) inside the object itself, saving a malloc https://github.com/jcupitt/libdicom/blob/add-vr-enum/src/dicom-data.c#L29-L82

Generic getters and setters

Instead of a getter and setter for each VR, there's now a generic getter and setter for each general class of VR. For example:

https://github.com/jcupitt/libdicom/blob/add-vr-enum/src/dicom-dict.c#L48-L158

Can be used to set the value of any element with a string-style VR.

Revised memory management policy

libdicom objects only take ownership of references created by libdicom.

If you pass in a pointer to an object created by another library, you can use the steal flag to say whether you want libdicom to steal ownership, or make a copy.

https://github.com/jcupitt/libdicom/blob/add-vr-enum/src/dicom-data.c#L512-L515

This is necessary for integration with eg. python.

Updated docs

I've tried to revise all the docs to match.

Revised parser

dicom-file.c is updated to match and tidied up a bit.

All tests pass

And it can read frames from Leica and 3D Histech dicoms successfully and with no memory leaks.

Sample code

Here's a (slightly silly) demo of the revised API:

DcmElement *make_title(DcmError **error, const char *value)
{
    DcmElement *element;

    element = dcm_element_create(&error, 0x00020016, DCM_VR_AE);
    if (element == NULL) {
        return NULL;
    }

    if (!dcm_element_set_value_string(NULL, element, value, false)) {
        dcm_element_destroy(element);
        return NULL;
    }

    return element;
}
  1. Element creation is split into two, with dcm_element_create() making a header, and a set of functions like dcm_element_set_value_string() attaching a value.
  2. The false on dcm_element_set_value_string() means don't steal the string pointer, ie. make a copy of the value that the element will own.
  3. If dcm_element_set_value_string() fails, the element is not destroyed. Failed operations leave everything as it was before the function was called. All other dcm_ functions now work this way.
  4. It knows what tags can take what VRs, including tags that can take multiple VRs, so dcm_element_create() will fail if the combination is now allowed.
  5. Equally, it knows what VRs can take what values, so dcm_element_set_value_string() will fail for non-string elements.
  6. If the VR allows VM > 1, it'll automatically parse the string for you.
  7. The numeric VRs have separate getters and setters for int and float values that will cast appropriately.
  8. Though getting and setting arrays of values requires you to get the exact numeric format correct.
jcupitt commented 1 year ago

OK, I think this is now ready for review.

jcupitt commented 1 year ago

@bgilbert would you mind approving this? I need it for the openslide loader.

bgilbert commented 1 year ago

This is a larger review than I'm able to do right now, unfortunately.

jcupitt commented 1 year ago

Ah that's fine, I wasn't expecting a review, I agree that'd be an unreasonable amount of work. I don't think the PR + review model works that well early in a project's life.

I'll close this and switch development to my fork. We can update this repo later as time allows.

hackermd commented 1 year ago

@jcupitt I think it would be nice for development increments to be documented and included in the main repo. Would it make sense to merge such changes into a dev branch instead? This branch could be less protected and allow you to progress without having to wait for approval.

jcupitt commented 1 year ago

Sure, that sounds like a good idea. Thanks Markus!

hackermd commented 1 year ago

@jcupitt I had a quick look at the changes. Here are a few initial thoughts:

jcupitt commented 1 year ago

Value setters: I considered setters initially, but instead decided in favour of immutable elements.

Elements now have an assigned flag, so setters can only be called once, and getters will fail if there's no value. So elements are still immutable, I think.

This PR doesn't check that an element has been assigned when adding it to a dataset, that'd be a good idea. I'll try to add that now.

Another advantage of separating element create and value assign is the much smaller API. It should make a python binding for libdicom quite a bit easier.

jcupitt commented 1 year ago

I was concerned that an enum in dicom.h may break the ABI if a new VR gets introduced into the standard.

If new enums are appended (as opposed to inserted earlier in the list), the ABI won't change. You're right that inserting new VRs early on would cause a renumbering and break binary backwards compatibility.

jcupitt commented 1 year ago

... actually, looking at it again, we can improve ABI compatibility on VR additions. I added another commit.

hackermd commented 1 year ago

@jcupitt I've changed the base branch to dev as discussed. I would in principle also be fine with merging directly into main during this stage of development. However, it may be nicer to develop in a separate branch for now and preserve the ability for @bgilbert to more thoroughly review changes in comparison to the main branch.

bgilbert commented 1 year ago

Either dev or main works for me. I'm not planning to closely review changes, and am fine with them merging directly to main. Mainly I'd prefer to avoid approving PRs I haven't actually looked at.

jcupitt commented 1 year ago

OK, merged to -dev. Thanks very much!

Shall I continue to make PRs to dev, or just push changes there? I think it depends how much time you have to review stuff, Markus. I imagine there will be quite a bit of flux on libdicom for another month or two.

hackermd commented 1 year ago

Shall I continue to make PRs to dev, or just push changes there? I think it depends how much time you have to review stuff, Markus. I imagine there will be quite a bit of flux on libdicom for another month or two.

I think it's nice to use PRs for individual features (scoped work packages) that correspond to one issue. Even if @bgilbert and I currently don't have the time to review each PR in depth, the PR is in my opinion still useful for us to provide feedback and discuss at a high level (as we did on this PR) without holding up development progress. What do you think?

hackermd commented 1 year ago

In general, I prefer the following git workflow:

The main goals are to ensure that main is always installable and that any change that gets introduced into main results in a version increment.

Ideally, the changes for implementation of each feature (or refactoring) would be reviewed thoroughly when the PR gets merged into dev. However, I am fine with delaying that until the next release so that we can iterate faster. At a minimum, the PR should pass all automated checks (unit tests, etc.).

jcupitt commented 1 year ago

OK, I'll make a set of PRs to dev.

Off-topic, but here's a DICOM WSI image being displayed!

image

That's libdicom being called from openslide, and openslide being called from libvips, and vipsdisp calling libvips and displaying the image!

There are a few minor rendering problems still, I'll try to fix them tomorrow.

jcupitt commented 1 year ago

The prototype openslide DICOM loader: https://github.com/jcupitt/openslide/blob/add-libdicom-prototype/src/openslide-vendor-dicom.c

hackermd commented 1 year ago

@jcupitt that's awesome!!! So cool to seeing libdicom in action.

hackermd commented 1 year ago

cc @dclunie @fedorov

jcupitt commented 1 year ago

I fixed the rendering errors and made a tiny video showing the thing in action:

http://www.rollthepotato.net/~john/untitled.mp4

That's the 3dhistech demo WSI DICOM being displayed. These DICOMs display really nicely, it's only a few ms to scan all of the levels in a directory and open them as a pyramid. The Leica DICOMs will be a lot more challenging, unfortunately.

openslide and libvips have parallel decode, which is cool. It's using libdicom to get the frames from the file (this is single threaded, since it's using seek + read to grab a tile), but then the jpeg decode of each tile (the slow part) is threaded.

I made the viewer last year -- that's loading the tiles as libvips generates them into your GPU as textures, them drawing the display by culling and then compositing the tree of visible tiles. You can pan and zoom at a pretty smooth 60fps, even on a large display.

The viewer is on flathub: https://flathub.org/apps/details/org.libvips.vipsdisp

Though obv. that binary doesn't have DICOM support.