[toimage]: bulk conversion working

PierreBou91 commented 11 months ago

As per #343 I have implemented the possibility to convert multiple dicoms to a folder of images.

For consistency, I have copied a lot of the logic from the dicom-storescu but I don't think it's worth extracting it yet.

Here are my questions/concerns about this commit and I need some guidance:

1. Not tested on multiframe dicom: I am very unfamiliar with multiframe dicoms and I don't really know where to find or create proper ones. I was not able to test this commit with multiframe dicoms. The basic functionality should be preserved but even though, I'm not certain that this is the expected behavior (convert the nth frame of every multiframe dicom in the input). This is by far my biggest concern for this commit because there might be regressions and I don't know of an acceptable way to handle this since the use case (multiframe to image) is unclear to me.

2. Error enum with only Error::Other variant: Initially I copied the Error enum from dicom-storescu required in the check_file function return type that I also imported. But since Error::Other is the only variant, maybe should it just be removed altogether ?

3. output flag was replaced by out-dir: This has several consequences:

The possibility to rename the output file for a single conversion is not retained.
If several files have the same name in the set to be converted they will be overridden except for the last one being converted.

I just went to the simplest implementation that matched the specs in #343. If the mentioned consequences are not acceptable, I can try and find some ways to fix them.

4. The format flag currently takes a String: It works as is but since the image crate is exposing an image::ImageFormat enum, this can be a good base for clap's checking and suggestions. I refrained from doing it because I didn't know if every image format should be allowed or any other restriction I don't know about.

Overall this commit is far from perfect but I still opened this PR since I already find it more convenient than DCMTK's equivalent binary.

Enet4 commented 11 months ago

Thank you for working on this. As I suspect that we might need some time to review and I was about to bring forward a new release of DICOM-rs, I propose that this is worked on after 0.6.0 is out. This should not stop me from publishing this enhancement in a subsequent minor release anyway.

PierreBou91 commented 11 months ago

Oh yeah sure, I think you're right ! I'll check in back here when the 0.6.0 is out.

Enet4 commented 11 months ago

1. Not tested on multiframe dicom: I am very unfamiliar with multiframe dicoms and I don't really know where to find or create proper ones. I was not able to test this commit with multiframe dicoms. The basic functionality should be preserved but even though, I'm not certain that this is the expected behavior (convert the nth frame of every multiframe dicom in the input). This is by far my biggest concern for this commit because there might be regressions and I don't know of an acceptable way to handle this since the use case (multiframe to image) is unclear to me.

DICOM WG4 has a data set with some example files, and the pydicom test files also have some files which are multi-frame. Running cargo test on the project already downloads some of these via dicom-test-files (see for example pydicom/color3d_jpeg_baseline.dcm).

In practice, I can't be sure which behavior is more suitable, but we can either a) pick the first frame only, or b) print all frames into separate files with an extension to the file name, e.g. abc.dcm to abc_0.png, abc_1.png, abc_2.png,...

2. Error enum with only Error::Other variant: Initially I copied the Error enum from dicom-storescu required in the check_file function return type that I also imported. But since Error::Other is the only variant, maybe should it just be removed altogether ?

Generally, we would go in the opposite direction from here: given this Error type, we can slowly add more specific error variants and replace context calls accordingly.

3. output flag was replaced by out-dir: This has several consequences:

The possibility to rename the output file for a single conversion is not retained.

If several files have the same name in the set to be converted they will be overridden except for the last one being converted.

I just went to the simplest implementation that matched the specs in #343. If the mentioned consequences are not acceptable, I can try and find some ways to fix them.

I admit that this was not made clear in the original issue, but I would prefer that the functionality to convert a single image is retained. It is OK if the code ends up with a bit of redundancy. --out-dir and -o should be marked as conflicting with each other, so it's either one or the other.

4. The format flag currently takes a String: It works as is but since the image crate is exposing an image::ImageFormat enum, this can be a good base for clap's checking and suggestions. I refrained from doing it because I didn't know if every image format should be allowed or any other restriction I don't know about.

That should be fine. At least adding support for extra image formats is as simple as updating image and its feature set.

PierreBou91 commented 11 months ago

Thanks for your guidance @Enet4 !

I made good progress on this PR, but I have an issue and I'm not sure what is the best way to handle it:

To select the image format, it's only necessary to set the extension of the PathBuf to the target extension.

So far, I took the original file name and set the extension but I realized there is an issue with filenames that don't include an extension and include a "." which is unfortunately pretty common. In such cases, methods like file_stem or set_extension don't work correctly since they consider the last portion of the filename (after the last ".") to be an extension so it get lost.

For example given a file f with a filename of "1.2.276.0.123":

calling set_extension("png") on f would return "1.2.276.0.png" (losing the last part)
calling file_stem() on f would return "1.2.276.0" (also losing the last part)

Below is a code snippet to illustrate:

use std::{ffi::OsStr, path::PathBuf};

fn main() {
    let mut path = PathBuf::from("1.2.276.0.123");
    path.set_extension("png");
    assert_eq!(path, PathBuf::from("1.2.276.0.png"));

    let path = PathBuf::from("1.2.276.0.123");
    let stem = path.file_stem().unwrap();
    assert_eq!(stem, OsStr::new("1.2.276.0"));
}

As for the workaround, here are the propositions I though of:

We could force the user to pass a filename command giving some template of the filename based on some dicom tags. example: --filename "{PatientName}-{SOPInstanceUID}" This would allow to only have to append the target to this pattern and be done with it. I implemented a similar logic on an other project and it works fine if the errors (element not found) are correctly handled. The main downside of this is that it introduces complexity for the user.
We could just append the target extension regardless of the original filename. (I think that's how dcmtk behaves) This would work but we could easily end up with filenames that end in "123.dcm.png" which is not so pretty.
We could force filenames to be something predictable like a timestamp. The downside would be that matching original files with their images is harder.

If someone has an other suggestion as well or any opinion on the best one it would be great.

Thanks

Enet4 commented 11 months ago

I would detect .dcm and only replace the extension in this case, extending the file name in all other cases.

PierreBou91 commented 11 months ago

Hello,

This should be good.

Multiframe images:
- The user can pass the -F flag to ask for every frame to be converted
- If -F is passed, every image will have a _n appended before the extension
- By default, only the first frame will be converted
Error handling:
- I added several other errors to make a better use of Snafu
Single file conversion:
- The user can pass a single file (instead of a folder)
- The user can use the -O option to rename the single file (excluding the extension that should be set with t)
- The -O option will be ignored with a warning if multiple files are passed for conversion
File names:
- dicom-toimage correctly handles name collisions by appending (n) before the extension if a file already has the name in the output folder.

Please tell me if any other change should be made, I'll gladly oblige.

PierreBou91 commented 10 months ago

Hey,

Thanks for the testing and feedbacks!

I'm currently abroad but I'll make sure to make the modifications as soon as I'm back and have a little time. (likely during a weekend of September).

PierreBou91 commented 9 months ago

Hey hello, @Enet4

I am having trouble resolving conflicts in a clean manner. So I plan on starting from a new PR directly from master currents version of the toimage crate. So unless you have a better suggestion, I think we can close this PR.

I just have a quick question. I see that on master's version of main, you have extended the errors and also attributed some specific exit codes. Is there a specification on how exit codes should be attributed or should I just decrease as I add more errors (and group the one that share the same source as I think this is what you've done ?

Thanks !

Pierre

Enet4 commented 9 months ago

Hello @PierreBou91!

So unless you have a better suggestion, I think we can close this PR.

It is fine either way. It should also be possible to keep the pull request by hard-resetting the branch and doing a forced push.

As for the exit code, it was primarily to differentiate between different kinds of errors, but none of it is intended to follow a stable guideline, so feel free to rearrange them as needed.

PierreBou91 commented 9 months ago

I hard reset and force-pushed as advised and reimplemented the bulk conversion.

User can now pass a path to a file, multiple files or a folder (possibly recursively with -r flag). All the file paths will be collected in a Vec<InOuPair> and the output path of will be directly computed according to the --output to chose the name, --ext to chose the extension and, --dir to chose an output directory. The previous run() function is essentially unchanged, just moved in the convert_file function.

I have not reimplemented the frame selection for multiframe dicoms because I'm not sure yet how to do it with the --unwrap flag. I'll try to research the fragments because I'm not familiar with it and the vast majority of the dicoms I have tested did not successfully converted with --unwrap (only the multiframe dicom example worked fine). Do you think we could see the multiframe part in a different PR ?

Please tell me if you see any improvements to be made. I also intent on doing an improved error handling for the inventory part.

Enet4 / dicom-rs

[toimage]: bulk conversion working #397