Closed jcupitt closed 1 year ago
OK, I think this is now ready for review.
@bgilbert would you mind approving this? I need it for the openslide loader.
This is a larger review than I'm able to do right now, unfortunately.
Ah that's fine, I wasn't expecting a review, I agree that'd be an unreasonable amount of work. I don't think the PR + review model works that well early in a project's life.
I'll close this and switch development to my fork. We can update this repo later as time allows.
@jcupitt I think it would be nice for development increments to be documented and included in the main repo. Would it make sense to merge such changes into a dev
branch instead? This branch could be less protected and allow you to progress without having to wait for approval.
Sure, that sounds like a good idea. Thanks Markus!
@jcupitt I had a quick look at the changes. Here are a few initial thoughts:
dicom.h
may break the ABI if a new VR gets introduced into the standard. Have you considered this? However, I am not sure whether this is a valid concern and whether there is a change that new VRs will be introduced into the standard any time soon. @dclunie what are your thoughts on this?steal
flag. That's super useful.dcm_element_create_*()
functions to be constructor functions that create an immutable object of a given "data type" (i.e., value representation). Making data elements immutable has several advantages and simplifies downstream logic, because one doesn't always have to check whether a value has already been set (when cloning an element, adding an element to a dataset, etc.) or prevent users from overriding existing values.Value setters: I considered setters initially, but instead decided in favour of immutable elements.
Elements now have an assigned
flag, so setters can only be called once, and getters will fail if there's no value. So elements are still immutable, I think.
This PR doesn't check that an element has been assigned when adding it to a dataset, that'd be a good idea. I'll try to add that now.
Another advantage of separating element create and value assign is the much smaller API. It should make a python binding for libdicom quite a bit easier.
I was concerned that an enum in dicom.h may break the ABI if a new VR gets introduced into the standard.
If new enums are appended (as opposed to inserted earlier in the list), the ABI won't change. You're right that inserting new VRs early on would cause a renumbering and break binary backwards compatibility.
... actually, looking at it again, we can improve ABI compatibility on VR additions. I added another commit.
@jcupitt I've changed the base branch to dev
as discussed. I would in principle also be fine with merging directly into main
during this stage of development. However, it may be nicer to develop in a separate branch for now and preserve the ability for @bgilbert to more thoroughly review changes in comparison to the main branch.
Either dev
or main
works for me. I'm not planning to closely review changes, and am fine with them merging directly to main
. Mainly I'd prefer to avoid approving PRs I haven't actually looked at.
OK, merged to -dev
. Thanks very much!
Shall I continue to make PRs to dev, or just push changes there? I think it depends how much time you have to review stuff, Markus. I imagine there will be quite a bit of flux on libdicom for another month or two.
Shall I continue to make PRs to dev, or just push changes there? I think it depends how much time you have to review stuff, Markus. I imagine there will be quite a bit of flux on libdicom for another month or two.
I think it's nice to use PRs for individual features (scoped work packages) that correspond to one issue. Even if @bgilbert and I currently don't have the time to review each PR in depth, the PR is in my opinion still useful for us to provide feedback and discuss at a high level (as we did on this PR) without holding up development progress. What do you think?
In general, I prefer the following git workflow:
dev
branch -> each PR closes one feature request issuedev
branch with PRs targeting main
branch (and tagging of main
branch with minor or major version increment, depending on the nature of the changes)main
branch (and rebasing dev
branch onto updated main
branch and tagging of main
branch with patch version increment) -> each PR closes one bug report issueThe main goals are to ensure that main
is always installable and that any change that gets introduced into main
results in a version increment.
Ideally, the changes for implementation of each feature (or refactoring) would be reviewed thoroughly when the PR gets merged into dev
. However, I am fine with delaying that until the next release so that we can iterate faster. At a minimum, the PR should pass all automated checks (unit tests, etc.).
OK, I'll make a set of PRs to dev.
Off-topic, but here's a DICOM WSI image being displayed!
That's libdicom being called from openslide, and openslide being called from libvips, and vipsdisp calling libvips and displaying the image!
There are a few minor rendering problems still, I'll try to fix them tomorrow.
The prototype openslide DICOM loader: https://github.com/jcupitt/openslide/blob/add-libdicom-prototype/src/openslide-vendor-dicom.c
@jcupitt that's awesome!!! So cool to seeing libdicom in action.
cc @dclunie @fedorov
I fixed the rendering errors and made a tiny video showing the thing in action:
http://www.rollthepotato.net/~john/untitled.mp4
That's the 3dhistech demo WSI DICOM being displayed. These DICOMs display really nicely, it's only a few ms to scan all of the levels in a directory and open them as a pyramid. The Leica DICOMs will be a lot more challenging, unfortunately.
openslide and libvips have parallel decode, which is cool. It's using libdicom to get the frames from the file (this is single threaded, since it's using seek + read to grab a tile), but then the jpeg decode of each tile (the slow part) is threaded.
I made the viewer last year -- that's loading the tiles as libvips generates them into your GPU as textures, them drawing the display by culling and then compositing the tree of visible tiles. You can pan and zoom at a pretty smooth 60fps, even on a large display.
The viewer is on flathub: https://flathub.org/apps/details/org.libvips.vipsdisp
Though obv. that binary doesn't have DICOM support.
This (huge) PR revises dicom-data with the following (hopefully) improvements:
No strings for VR
There's an enum for the VRs and their combinations defined in dicom.h:
https://github.com/jcupitt/libdicom/blob/add-vr-enum/include/dicom/dicom.h#L181-L228
Removing the strcmp()s speeds the parser up by about 20%. The enum lets us index a table of VR properties:
https://github.com/jcupitt/libdicom/blob/add-vr-enum/src/dicom-dict.c#L48-L158
Fewer mallocs
IHeader
andEHeader
objects no longer exist (we use locals instead, andelement_read_header()
)DcmElement
hides VM 1 values (a very common case) inside the object itself, saving a malloc https://github.com/jcupitt/libdicom/blob/add-vr-enum/src/dicom-data.c#L29-L82Generic getters and setters
Instead of a getter and setter for each VR, there's now a generic getter and setter for each general class of VR. For example:
https://github.com/jcupitt/libdicom/blob/add-vr-enum/src/dicom-dict.c#L48-L158
Can be used to set the value of any element with a string-style VR.
Revised memory management policy
libdicom objects only take ownership of references created by libdicom.
If you pass in a pointer to an object created by another library, you can use the
steal
flag to say whether you want libdicom to steal ownership, or make a copy.https://github.com/jcupitt/libdicom/blob/add-vr-enum/src/dicom-data.c#L512-L515
This is necessary for integration with eg. python.
Updated docs
I've tried to revise all the docs to match.
Revised parser
dicom-file.c
is updated to match and tidied up a bit.All tests pass
And it can read frames from Leica and 3D Histech dicoms successfully and with no memory leaks.
Sample code
Here's a (slightly silly) demo of the revised API:
dcm_element_create()
making a header, and a set of functions likedcm_element_set_value_string()
attaching a value.false
ondcm_element_set_value_string()
means don't steal the string pointer, ie. make a copy of the value that the element will own.dcm_element_set_value_string()
fails, the element is not destroyed. Failed operations leave everything as it was before the function was called. All otherdcm_
functions now work this way.dcm_element_create()
will fail if the combination is now allowed.dcm_element_set_value_string()
will fail for non-string elements.