Enet4 / dicom-rs

Rust implementation of the DICOM standard
https://dicom-rs.github.io
Apache License 2.0
403 stars 75 forks source link

PS3.5 H3.2 compliant #445

Open 9enki opened 7 months ago

9enki commented 7 months ago

H3 of PS3.5 indicates that multiple encoding formats are applied within the Patient Name. https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_H.3.html https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_H.3.2.html

Currently, although it is possible to obtain multiple Specification Character Sets as Strs, decoding in the above case seems to decode at the first element of the Specification Character Set.

I would like to be able to decode the Patient Name in this PR in the correct way when it is encoded in multiple encoding formats.

It is likely that PS3.5 will be able to promptly support PS3.5 Chapter I Korean and Chapter J Chinese by utilizing this support.

9enki commented 7 months ago

@Enet4 Hello. I would like to do what we could not do in PR #444 here.

Let me ask you two questions.

9enki commented 7 months ago

If Patient Name does not contain delimiter =, read_value_pns executes read_value_str as before. I found that read_value_pns reads the value from the buffer, so it would be a bug if read_value_strs was executed afterwards. I will come up with a fix for this.

The process after reading information from the buffer of read_value_strs is split into separate processes named read_value_strs_impl, which can be called from both read_value_strs and read_value_pns. https://github.com/Enet4/dicom-rs/pull/445/files#diff-65231185718d602fd3a63db21c6076af076fd9afb6c3d4303c6baeaffa2e1e13R385

9enki commented 7 months ago

(If the answer to the first question is yes) Is there any way to get all the values of a Specification Character Set in a read_value_pns? The current implementation of StatefulDecoder keeps only the first one, even if there are multiple Specific Character Sets, as shown here. Since changing self.text from SpecificCharacterSet to Vec is a big change, if there is another way to get the value of the Specification Character Set I would prefer to use that method. I would prefer to use that method if I can get the Specification Character Set value another way.

9enki commented 7 months ago

@Enet4 Implementation of this PR is complete. I would appreciate a review when you have time.

Enet4 commented 7 months ago

Hello, @9enki! I'll be looking into DICOM-rs matters the upcoming weeks after a small hiatus.

  • I have a function called read_value_pns. I want to execute this function when VR::PN. If Patient Name does not contain delimiter =, read_value_pns executes read_value_strs as before. Is this an acceptable method of implementation for you? If this is a way to deviate from the design concept, I would be glad to get better advice.

The way I see it, the concept of Person Name should include all component groups, rather than just one (alphabetic, ideographic, and phonetic), so they would all sit in a single PrimitiveValue. So, once we find out that we have component groups other than the alphabetical one, we decode each group and merge them back together. From the little I saw of the code so far, that seems to be the intended way to go. In any case, I am likely to give it another look later.

  • Is there any way to get all the values of a Specification Character Set in a read_value_pns? The current implementation of StatefulDecoder keeps only the first one, even if there are multiple Specific Character Sets, as shown here. Since changing self.text from SpecificCharacterSet to Vec<SpecificCharacterSet> is a big change, if there is another way to get the value of the Specification Character Set I would prefer to use that method. I would prefer to use that method if I can get the Specification Character Set value another way.

The main problem is that SpecificCharacterSet is too simple right now. The way forward is to adjust this data type to support more than one character set value. The steps would be something like this:

  1. The existing idea of a SpecificCharacterSet would become a CharacterSetCode (a single encoding).
  2. SpecificCharacterSet is recreated as an opaque structure that can hold up to n codes (depending on which they are; in ISO_IR 192 and some others, only 1 value would be acceptable).

Aside from the fact that one would no longer be able to introspect into the variants available, I think that most of the API would stay intact.

#[derive(Debug)]
pub struct SpecificCharacterSet(SmallVec<[CharacterSetCode; 2]>);

As we are now gathering enough material for a major version (0.7), we can proceed with breaking changes on SpecificCharacterSet and PersonName to fulfill these needs.

If any further guidance is needed, feel free to let me know.

Enet4 commented 3 months ago

Hello again. This contribution is a candidate for inclusion in 0.7.0, but then we need to make the proposed change of changing SpecificCharacterSet so that the type becomes opaque and a single value encompasses the possibility of multiple character set codings. Please let me know if you would be able to fulfill this or you would prefer someone else to take over.

Enet4 commented 2 months ago

Just letting you know that #489 was merged. It made SpecificCharacterSet opaque, so further contributions on text encoding can be brought in without breaking changes to the API.