dcmjs-org / dcmjs

Javascript implementation of DICOM manipulation
https://dcmjs.netlify.com/
MIT License
291 stars 110 forks source link

DicomMessage.readFile failes if GroupLength parameter is wrong calculated #338

Closed kursawero closed 1 year ago

kursawero commented 1 year ago

We started with a JavaScript ( Uploader) project where we want to provide the functionality that DICOM files can be selected, pseudomized and uploaded from a clinical environment to research infrastructure. An important side condition is that the user should not be enforced to install additional tools.

All in all, we got the prototype running by using dcmjs. Unfortunately, some real clinical data sets make problems. It seems that the readFile method fails if the the GroupLength attribute is wrong. The DICOM standard says that the Group Length attribute is actually retired.

Is there an option to deactivate using the GroupLength tag in dcmjs or to recalculate the tag within the JavaScript world?

I already tried calling the method with the "ignoreErrors: true" option, but then we miss the DICOM tags from the group where the GroupLength is wrong.

Running the command line tool "dcmconv" from DCMTK would fix the problem by recalculating the GroupLength, but not all users would be able to handle that. There is also another older project, the [cross-compilation of dcmtk] (https://github.com/commontk/dcmjs). It would allow to run "dcmconv" in a JavaScript environment, but it seems the project is older and the scripts needs to be adapted to build the specific js library. Based on the discussion Any on going work on this project?, I was thinking starting here to ask for help would make sense.

From the DICOM standard perspective, I think, it would be better to not rely on the GroupLength tag since it is already retired.

Reproducing the issue is simple: Reduce the value of the GroupLength attribute of a group in the DICOM tag and then run:

const arrayBuffer = fs.readFileSync("dicomFileName.dcm").buffer;
const dicomDict = DicomMessage.readFile(arrayBuffer);

You should see an error message like:

RangeError: Offset is outside the bounds of the DataView
        at DataView.getUint16 (<anonymous>)

      131 |
      132 |     readUint16() {
    > 133 |         var val = this.view.getUint16(this.offset, this.isLittleEndian);
          |                             ^
      134 |         this.increment(2);
      135 |         return val;
      136 |     }

      at ReadBufferStream.readUint16 (src/BufferStream.js:133:29)
      at SequenceOfItems.readBytes (src/ValueRepresentation.js:777:44)
      at SequenceOfItems.read (src/ValueRepresentation.js:57:21)
      at Function._readTag (src/DicomMessage.js:334:26)
      at Function._read (src/DicomMessage.js:115:47)
      at Function.readFile (src/DicomMessage.js:229:36)
      at Object.<anonymous> (test/rpbfiles.test.js:15:36)

Any help is really appreciated.

pieper commented 1 year ago

Hi - thanks for reporting and linking to your RPB work. Your uploader use case is exactly the kind of thing dcmjs is intended to support, but yes, the "real world" of dicom is complex and often frustrating.

The older commontk effort to cross compile dcmtk should still work, and in fact if it needs to be updated it is probably much easier to implement today than it was when we first did the experiment. The plan was exactly to take advantage of decades of battle hardening that the dcmtk team put into the package so it's probably the most robust solution. It's a fairly big package by javascript standards and slower so you could load it only if you run into problematic data that the pure js dcmjs can't handle.

That said, it's also probably not hard to fix the javascript code so that it works for this data. If you want to provide some code with tests that would be great.

I try to keep an eye on these projects but I'm not actively developing either one so they rely on community collaboration to keep them going.

kursawero commented 1 year ago

Hi, thanks for your answer. We dig a little bit deeper and found a way how it could be fixed in the Javascript code. I added a pull request this.

It seems that we have sometimes some binary data in private tags in our data sets that have byte sequences which are similar to some tags that are used to detect the end of the sequence. The issue is gone by using tag and length, which are defined by the standard. Then we have 8 byte instead of 4 which we compare and it is more unlikely to match that sequence with binary data.

The "dcmconv" command line tool just calculated the item length and converted from undefined length items to defined length items that then did not pass the specific part of the code. That means at the end that the command line tool most likely also uses tag and length attributes for detection or it evaluates the "Sequence Delimitation Item", which is here just used in the outer while loop.

It would be great if someone could review and help getting the pull request merged.

kursawero commented 1 year ago

I think, we can close the ticket, since the pull request has been merged to master. Thank you very much for the quick review.

It seems that the semantic release automation did not work (link to the failed job), because of the capitalized "F" in fix():. Could you please check that?

pieper commented 1 year ago

Okay, let me try another commit.