ImagingDataCommons / libdicom

C library for reading DICOM files
https://libdicom.readthedocs.io
MIT License
16 stars 7 forks source link

Allow caller to provide its own file I/O #18

Closed bgilbert closed 1 year ago

bgilbert commented 1 year ago

Some applications may want to use their own file abstraction rather than having libdicom read directly from the filesystem. (For example, the application may have a DICOM File in memory and want to avoid writing it to a temporary file.) Please consider providing an alternative to dcm_file_create() that takes an opaque pointer plus read/seek/eof callback functions that operate on it.

hackermd commented 1 year ago

I have been considering this before but was unsure about how to best implement such an abstraction. I think your suggestion is great.

If I understand you correctly, we could provide a dcm_file_create_from_reader() (or something along those lines, which would receive a FILE-like object along with the callback functions for data access. The dcm_file_create() function could then reuse that function under the hood and supply the fread/fseek/feof functions as read/seek/eof callback functions. If we store those as function pointers on DcmFile objects, the interface of its functions (e.g., dcm_file_read_metadata()) would not have to change.

The library currently does not yet support writing, but we may want to consider adding a write callback at this point, so that we won't have to change the API later. What do you think @bgilbert?

bgilbert commented 1 year ago

Yup, that's the idea. It'd be worth spending a bit of time to define the callback function semantics, since the fread/fseek/feof signatures can't be reused verbatim anyway. (They take FILE * rather than void *.) It might be worth combining the two size arguments in fread, and possibly defining its return value to be signed as read(2) does so a separate feof isn't needed. dcm_file_create can then configure the DcmFile object with small wrappers around fread/fseek/feof.

Re writing, if there isn't an intention to support both reading and writing in the same DcmFile, it might be a bit cleaner to create separate dcm_file_create_from_reader and dcm_file_create_from_writer functions. The latter could then be deferred until the library supports writing. I'd suggest splitting dcm_file_create the same way, too, if the current API isn't considered stable yet.

jcupitt commented 1 year ago

We went though this a while ago with libvips and my feeling was that there are some problems with stdio on Windows:

Plus the benefits of stdio are rather small. You can implement buffered read and write in under 100 lines of C, and have better EOF handling.

We went for the base open/read/write/seek calls, with user callbacks that exactly match the POSIX specs for these functions. This has the advantage of avoiding stdio, and giving users a well documented and understood interface. We jumped off the spec and used 64 bit ints for offset and size.

bgilbert commented 1 year ago

Strictly speaking, the callback API and the default I/O handling are separate concerns, right? I agree that the unbuffered I/O API presents a cleaner interface for the callbacks. libdicom can then map that onto whatever default I/O implementation it prefers.

hackermd commented 1 year ago

Since DICOM data are often communication over network (DICOMweb), a clean abstraction of the underlying data access operations via the callback API would facilitate reading data sets from in memory file representations. I think this would also be useful for other language bindings (e.g., JavaScript/WASM where one would rarely access files on disk).

jcupitt commented 1 year ago

Implemented with https://github.com/ImagingDataCommons/libdicom/pull/36