dcmjs-org / dcmjs

Javascript implementation of DICOM manipulation
https://dcmjs.netlify.com/
MIT License
300 stars 112 forks source link

fix(exports): interpret PN tags as union between object and string #364

Closed dmlambo closed 1 year ago

dmlambo commented 1 year ago

the JSON standard represents PeopleNames as an object, rather than a delineated string. this caused SRs derived from metadata to have [object Object] for the patient name, notable in Google's Healthcare dicom store, which seems to pick a random patient name when searching for series. much of the change is attempting to communicate this union somewhat opaquely, and ensuring objects with additional properties aren't destroyed when being naturalized. tested with OHIF master branch.

dmlambo commented 1 year ago

A little addendum, you can reproduce the issue in OHIF by creating a single measurement and exporting a structured report. Set a breakpoint here (where it does the denaturalizing) and look at the '00100010' (PatientName) tag. It will be "[object Object]".

Finally, this implies a bit of a paradigm shift. In OHIF there are places where it's clear PNs are not handled correcly. Here, for example. With this change, users will have to be careful not to do something like dataSet.PatientName = "John Doe" since it will break the accessors. They will have to instead use them, ie, dataSet.PatientName.Alphabetic = "John Doe". Similarly, the accessors should be used for getting the components in a naturalized dataset. I fear this change could affect people in subtle and hard-to-discover ways.

My change is... kinda sloppy. That's not to say it's sloppily done, rather it reduces cleanliness a fair amount, since it's breaking through some conceptual boundaries. (For instance adding denaturalizing code to ValueRepresentation, sneaking toString and toJSON into native types.) If anyone has a better idea on how to fix it (that doesn't require an epic refactor) I'm all ears.

pieper commented 1 year ago

Thanks for working on this 👍

I'm trying to get my head around the issue - can you elaborate on your comment about derived SR having [object Object] as the Patient Name?

dmlambo commented 1 year ago

Sure thing!

What's happening is when OHIF tries to create a Structured Report, it is pulling instance metadata for the image you are annotating, which is drawn from DICOMWeb as application/dicom+json. The json representation of PN (person name) is not the same as in the binary format (part 10? I'm not super familiar with the naming conventions), as described here.

The Value Field elements of a DICOM attribute with a VR of PN. The non-empty name components of each element are encoded as a JSON strings with the following names:

  • Alphabetic
  • Ideographic
  • Phonetic

Which is to say, when you load a dataset, the Value field of the tag will be different depending on where you load it from. In the case of .dcm, the value will be:

data['00100010'].Value = ["Doe^John==doé^yon"],

whereas if loaded from json, it'll be:

data['00100010'].Value = [{Alphabetic: "Doe^John", Phonetic: "doé^yon"}]

So, in the latter circumstance, if you were write out the dcm file, eventually PersonName.write() would get called, which is derived from EncodedStringRepresentation, which calls toString() on the Value, and thus it writes out "[object Object]", since it is indeed an object, and not a string.

This is a problem because OHIF hydrates its structured reports with json data, thus all PN tags become "[object Object]".

A smaller change would be to just detect the type of Value when writing it from the PersonName class, and that would indeed fix binary output, but json output would be broken in the event someone loaded a binary dcm and wrote json.

I hope it's clearer now!

pieper commented 1 year ago

Thanks for the additional context. This is sounding like more of an OHIF issue than something that should be changed in dcmjs. The reason I say that is that by design a dataset object in dcmjs should be exactly in DICOM JSON Model, so we should be using the object form with the fields for PersonName and not the ^ separated version used in the Part 10 binary.

Can you point to where in OHIF the SR is being created / used that leads to the reported issue?

In dcmjs, the SR derivation is done by constructing a StructuredReport datasets being referenced as a argument:

https://github.com/dcmjs-org/dcmjs/blob/f844e025f86055a918c13269f6d762a711f3a4bf/src/derivations/StructuredReport.js#L4

This inherits from DerivedDataset, where the PatientName is just copied over:

https://github.com/dcmjs-org/dcmjs/blob/f844e025f86055a918c13269f6d762a711f3a4bf/src/derivations/DerivedDataset.js#L69

dmlambo commented 1 year ago

I believe they pull the metadata from here and create the SR just below it, here.

They also do some work[1] to detect[2] what kind of PN tag they're dealing with.

a dataset object in dcmjs should be exactly in DICOM JSON Model

Not according to the link I posted. I can see, for example, in the test mock for a minimum viable naturalized dataset the name is just a string, which is not what the standard describes. And similarly there is currently no conversion done between the object representation and a plain string of the format "A=B=C" when the dataset is written. Indeed, that's what I'm trying to do with my change.

pieper commented 1 year ago

Gotcha, thanks for the links. It does seem that PN doesn't follow the intended pattern, so that's a bug in the original implementation.

Do your suggested changes alter the API or are both string and object values are now supported?

Also, is it possible to separate the style changes so the functional changes are in a single commit that's easier to review?

dmlambo commented 1 year ago

My pleasure! Thanks for the review!

The API hasn't changed, and indeed strings from part10 data and objects from json are supported by the change.

It should be noted in future documentation that after ingestion (before and after naturalization) one should always use the json accessors to PN tags. After denaturalization, the value in the dict will take on this^string=format.

I'll separate out the format changes (thanks, lint-on-save 😄 ) and think about adding a couple more tests.

pieper commented 1 year ago

thanks again for working on this.

Does anyone else want to have a look before this gets finalized? @sedghi maybe?

sedghi commented 1 year ago

@wayfarer3130 any comment here?

dmlambo commented 1 year ago

Any more comments? 🙂

pieper commented 1 year ago

I'll defer to @wayfarer3130 for final review.

dmlambo commented 1 year ago

Incoming. Heh, this one is rather large, though some changes were reverted from before.

Note that as it stands one test will fail, because multiplicity.dcm does not have the correct tags, and I don't have access to the test data repo. Let me know where to send it to get it released!

This change may take a little while to explain and to understand, so take your time looking through it, and I'll answer any questions.

Basically whenever a tag root gets created (in the form of { vr: vr, values: values } or naturalDataset) or when a tag itself gets assigned (the values or Value properties, as well as any natural name property) some accessors will get placed on the object to allow one value to be represented in another way, by way of toJSON and toString. It's a similar concept to the previous changes, but wider scoped.

For instance:

    static naturalizeDataset(dataset) {
        const naturalDataset = ValueRepresentation.addTagAccessors({
            _vrMap: {}
        });

which replaces the dataset with a Proxy which intercepts any property assignment, looks up the propert's ValueRepresentation, and sets additional accessors on the value:

                set(target, prop, value) {
                    var vrType;
                    if (
                        ["values", "Value"].includes(prop) &&
                        target.vr &&
                        ValueRepresentation.hasValueAccessors(target.vr)
                    ) {
                        vrType = ValueRepresentation.createByTypeString(
                            target.vr
                        );
                    } else if (prop in DicomMetaDictionary.nameMap) {
                        vrType = ValueRepresentation.createByTypeString(
                            DicomMetaDictionary.nameMap[prop].vr
                        );
                    } else {
                        target[prop] = value;
                        return true;
                    }

                    target[prop] = vrType.addValueAccessors(value);

                    return true;
                }

This is to ensure if someone assigns a PN incorrectly (IE, without using the json model), we can still output correct JSON, as well as allowing us to convert to part10 given the same scenario.

I'll likely have a few smaller changes coming through, but let me know what you think in the meantime.

dmlambo commented 1 year ago

In the end other changes worked out in the wash, it seems pretty stable to me. Does anyone have any more comments on this change? Anything I can explain better? Thanks!

pieper commented 1 year ago

Thanks for your work on this @dmlambo 👍

From a read-through I believe I'm fine with this but would appreciate if @wayfarer3130 can look at the unresolved conversations.

Also I see a test failure now: https://github.com/dcmjs-org/dcmjs/actions/runs/6178725114/job/17111489516?pr=364

dmlambo commented 1 year ago

Also I see a test failure now: https://github.com/dcmjs-org/dcmjs/actions/runs/6178725114/job/17111489516?pr=364

This is because the multiplicity dcm in the test data repo doesn't have my changes. I've added a PN tag with multiplicity. I don't have permissions to update that file, so someone else would have to do it for me. 🙂

multiplicity.zip

Or if someone wants to modify the existing file, I used DICOM Editor Tool to add this tag:

(0008,1070)Operators' NamePN2Doe^John\Doe^Jane

Thanks!

wayfarer3130 commented 1 year ago

This is looking good now - much happier with it as updated. Could you move the handler out to a single const rather than redeclaring it in the creator? Then I'm happy with it.

pieper commented 1 year ago

Also I see a test failure now: https://github.com/dcmjs-org/dcmjs/actions/runs/6178725114/job/17111489516?pr=364

This is because the multiplicity dcm in the test data repo doesn't have my changes. I've added a PN tag with multiplicity. I don't have permissions to update that file, so someone else would have to do it for me. 🙂

multiplicity.zip

Or if someone wants to modify the existing file, I used DICOM Editor Tool to add this tag:

(0008,1070) Operators' Name PN 2 Doe^John\Doe^Jane Thanks!

I'm happy to put new testing data in the repository, but do you really need to change and existing file? Wouldn't it be better to add a new file and leave existing tests/data in place?

dmlambo commented 1 year ago

I'm happy to put new testing data in the repository, but do you really need to change and existing file? Wouldn't it be better to add a new file and leave existing tests/data in place?

That's up to you; I saw there was a file that dealt with multiplicity, so I used it. If you want to add another file, it would suffice to use the one I attached and rename it to something more relavent, at which point I'll update the PR.

pieper commented 1 year ago

Okay, data published as an asset here: https://github.com/dcmjs-org/data/releases/tag/multiplicity2

dmlambo commented 1 year ago

Rebased on top of master. Should be good to go unless there are any more notes I missed! Thanks for all the feedback everyone!

pieper commented 1 year ago

@wayfarer3130 is this ready to merge? Sounds like it's needed for https://github.com/OHIF/Viewers/pull/3739

@sedghi @jbocce we can merge this as far as I'm concerned.

sedghi commented 1 year ago

Looks like good to go

wayfarer3130 commented 1 year ago

I had approved the PR, yes, I think this is good to go.

github-actions[bot] commented 1 year ago

:tada: This PR is included in version 0.29.11 :tada:

The release is available on:

Your semantic-release bot :package::rocket: