Enet4 / dicom-rs

Rust implementation of the DICOM standard
https://dicom-rs.github.io
Apache License 2.0
403 stars 75 forks source link

Non-deterministic behavior in dicom-storescu function check_presentation_contexts #473

Open jennydaman opened 3 months ago

jennydaman commented 3 months ago

https://github.com/Enet4/dicom-rs/blob/dbd41ed3a0d1536747c6b8ea2b286e4c6e8ccc8a/storescu/src/main.rs#L213-L235

I am trying to use the code of dicom-storescu, though some non-deterministic behavior is making my tests flaky.

Get some example data here: https://github.com/FNNDSC/SAG-anon

I edited storescu/src/main.rs for debugging:

diff --git a/storescu/src/main.rs b/storescu/src/main.rs
index ca29e91..daecb3b 100644
--- a/storescu/src/main.rs
+++ b/storescu/src/main.rs
@@ -224,6 +224,8 @@ fn run() -> Result<(), Error> {
                 }
                 file.pc_selected = Some(pc);
                 file.ts_selected = Some(ts);
+                dbg!(&file.ts_selected);
+                std::process::exit(0);
             }
             Err(e) => {
                 error!("{}", Report::from_error(e));

In one terminal window, I am running

cargo run --bin dicom-storescp

And in another, I am running

cargo run --bin dicom-storescu 127.0.0.1:11111 ~/fnndsc/public_data/SAG-anon/0002-1.3.12.2.1107.5.2.19.45152.2013030808110261698786039.dcm

Here's the output from several back-to-back repeated runs:

$ ./target/debug/dicom-storescu 127.0.0.1:11111 ~/fnndsc/public_data/SAG-anon/0002-1.3.12.2.1107.5.2.19.45152.2013030808110261698786039.dcm
[storescu/src/main.rs:227:17] &file.ts_selected = Some(
    "1.2.840.10008.1.2.1",
)
$ ./target/debug/dicom-storescu 127.0.0.1:11111 ~/fnndsc/public_data/SAG-anon/0002-1.3.12.2.1107.5.2.19.45152.2013030808110261698786039.dcm
[storescu/src/main.rs:227:17] &file.ts_selected = Some(
    "1.2.840.10008.1.2",
)
$ ./target/debug/dicom-storescu 127.0.0.1:11111 ~/fnndsc/public_data/SAG-anon/0002-1.3.12.2.1107.5.2.19.45152.2013030808110261698786039.dcm
[storescu/src/main.rs:227:17] &file.ts_selected = Some(
    "1.2.840.10008.1.2.1",
)
$ ./target/debug/dicom-storescu 127.0.0.1:11111 ~/fnndsc/public_data/SAG-anon/0002-1.3.12.2.1107.5.2.19.45152.2013030808110261698786039.dcm
[storescu/src/main.rs:227:17] &file.ts_selected = Some(
    "1.2.840.10008.1.2.1",
)
$ ./target/debug/dicom-storescu 127.0.0.1:11111 ~/fnndsc/public_data/SAG-anon/0002-1.3.12.2.1107.5.2.19.45152.2013030808110261698786039.dcm
[storescu/src/main.rs:227:17] &file.ts_selected = Some(
    "1.2.840.10008.1.2.1",
)
$ ./target/debug/dicom-storescu 127.0.0.1:11111 ~/fnndsc/public_data/SAG-anon/0002-1.3.12.2.1107.5.2.19.45152.2013030808110261698786039.dcm
[storescu/src/main.rs:227:17] &file.ts_selected = Some(
    "1.2.840.10008.1.2.1",
)
$ ./target/debug/dicom-storescu 127.0.0.1:11111 ~/fnndsc/public_data/SAG-anon/0002-1.3.12.2.1107.5.2.19.45152.2013030808110261698786039.dcm
[storescu/src/main.rs:227:17] &file.ts_selected = Some(
    "1.2.840.10008.1.2",
)
$ ./target/debug/dicom-storescu 127.0.0.1:11111 ~/fnndsc/public_data/SAG-anon/0002-1.3.12.2.1107.5.2.19.45152.2013030808110261698786039.dcm
[storescu/src/main.rs:227:17] &file.ts_selected = Some(
    "1.2.840.10008.1.2.1",
)
$ ./target/debug/dicom-storescu 127.0.0.1:11111 ~/fnndsc/public_data/SAG-anon/0002-1.3.12.2.1107.5.2.19.45152.2013030808110261698786039.dcm
[storescu/src/main.rs:227:17] &file.ts_selected = Some(
    "1.2.840.10008.1.2.1",
)
$ ./target/debug/dicom-storescu 127.0.0.1:11111 ~/fnndsc/public_data/SAG-anon/0002-1.3.12.2.1107.5.2.19.45152.2013030808110261698786039.dcm
[storescu/src/main.rs:227:17] &file.ts_selected = Some(
    "1.2.840.10008.1.2",
)

Over 10 runs, 7 times the transfer syntax UID is "1.2.840.10008.1.2.1" and three times I get "1.2.840.10008.1.2" instead.

jennydaman commented 3 months ago

I have found out that scu.presentation_contexts() returns unsorted.

diff --git a/storescu/src/main.rs b/storescu/src/main.rs
index ca29e91..0c64a16 100644
--- a/storescu/src/main.rs
+++ b/storescu/src/main.rs
@@ -209,6 +209,11 @@ fn run() -> Result<(), Error> {
     }

     for file in &mut dicom_files {
+
+        let pcs: Vec<_> = scu.presentation_contexts().iter().map(|pc| pc.transfer_syntax.as_str()).collect();
+        dbg!(pcs);  // order is not guaranteed
+        std::process::exit(0);
+
         // identify the right transfer syntax to use
         let r: Result<_, Error> =
             check_presentation_contexts(file, scu.presentation_contexts(), never_transcode)
Enet4 commented 3 months ago

Thank you for reporting. This might have to do with the use of a hash set to keep track of the presentation contexts to propose, in line 134.

    let mut presentation_contexts = HashSet::new();

Changing it to a BTreeSet should clear the non-deterministic behavior. Would you like to try this and send a PR if it works?

jennydaman commented 3 months ago

That could be it! Though I shouldn't be making a PR about this yet. I myself am not fluent in DICOM and I am not sure which is correct (or better), 1.2.840.10008.1.2 or 1.2.840.10008.1.2.1.

I think the choice of which collection to use shouldn't matter, instead we should figure out how to make the reducer deterministic.

https://github.com/Enet4/dicom-rs/blob/dbd41ed3a0d1536747c6b8ea2b286e4c6e8ccc8a/storescu/src/main.rs#L514-L518

Could you explain the context and purpose of |ts| file_ts.is_codec_free() && ts.is_codec_free()?

Enet4 commented 3 months ago

That could be it! Though I shouldn't be making a PR about this yet. I myself am not fluent in DICOM and I am not sure which is correct (or better), 1.2.840.10008.1.2 or 1.2.840.10008.1.2.1.

The way I see it, the SCU can propose both to the SCP without issue. When sending files, it should pick one of them if it's an exact match with a presentation context (so that implicit VR LE files are also sent as they are, for example). In any case, if one of these two is not proposed, or the association acceptor only accepts one of them, the tool is already capable of transcoding on the spot (e.g. Implicit VR LE to Explicit VR LE).

I think the choice of which collection to use shouldn't matter, instead we should figure out how to make the reducer deterministic.

It matters because HashSet is non-deterministic in iteration by design, whereas BTreeSet ensures traversal by sorted keys. More on iteration over collections in the documentation of stdlib. I still think that this is the preferred way of resolving this particular issue.

Could you explain the context and purpose of |ts| file_ts.is_codec_free() && ts.is_codec_free()?

is_codec_free basically checks whether the transfer syntax is supported and in native, non-encapsulated, pixel data. And this is where the two transfer syntaxes listed above come in¹: among the presentation context's admitted transfer syntaxes, we are picking either the one which matches the file, or picking one of the two native transfer syntaxes as long as the file is also in a native transfer syntax.

¹ There's also Explicit VR Big Endian, but it is retired and there is increasingly less interest in supporting it

jennydaman commented 3 months ago

Checking I understand you correctly. So both 1.2.840.10008.1.2 or 1.2.840.10008.1.2.1 are correctly accepted, although 1.2.840.10008.1.2 is "preferred?"

Enet4 commented 3 months ago

Checking I understand you correctly. So both 1.2.840.10008.1.2 or 1.2.840.10008.1.2.1 are correctly accepted, although 1.2.840.10008.1.2 is "preferred?"

Well, not by intention, but you are right that there may be something else at play. The intended logic was to always prefer the transfer syntax matching the file's transfer syntax, but I suspect that it could find another codec-free transfer syntax first and pick that instead. If this is the case, then I suppose that we also need a second traversal over the presentation contexts.

We can also add support for letting the user specify whether to prefer explicit VR LE or implicit VR LE.

PierreBou91 commented 1 month ago

@Enet4 After checking I can confirm that the non-deterministic behavior is fixed by changing the HashMap for a BTreeSet

I can make a PR but since it's such a small change I wondered if you had other changes that I could do that could be bundled with it or if you wanted to do it with something else you're working on.

Enet4 commented 1 month ago

Thank you for the initiative @PierreBou91. Changing the map type is one improvement, but we may also benefit from tweaking the client side negotiation algorithm to prefer an exact transfer syntax match when it was offered by the association acceptor.

PierreBou91 commented 1 month ago

Ok this is interesting, just to make sure I'm on the right track, I rephrased below what I understood about the expected change: The objective would be to adapt the check_presentation_contexts function from the storescu so that it prioritize selecting a presentation context with a transfer syntax that exactly matches the file's TS when such an option is available from the SCP.

Would you have in mind the best way to manually test the current behavior and then test the change. I imagine I could:

Is it the expected behavior/process or have I missed/misunderstood something ?

Enet4 commented 1 month ago

@PierreBou91 Yes, that would help us cover the scenario where the association acceptor admits the transfer syntax of the original object to be transferred. The other criterion currently written is to transcode and accept Explicit VR Little Endian otherwise (only when --never-transcode is not enabled).

(Admittedly, a split between library and tool would have helped us write tests for this scenario. This could be a good milestone for the next few months, but we'll see.)