dcmjs-org / dcmjs

Javascript implementation of DICOM manipulation
https://dcmjs.netlify.com/
MIT License
292 stars 111 forks source link

naturalizeDataset in v0.18.11 creates in memory objects of 300mb+ #263

Closed dbousamra closed 2 years ago

dbousamra commented 2 years ago

Hi all.

I upgraded dcmjs, and I noticed calls to naturalizeDataset were taking longer. Looking further into it, writing the response to disk gives files of 300mb+, whereas in 0.18.10 (and prior), they were ~ 4kb. The file is full of thousands of null's.

pieper commented 2 years ago

That's interesting. Can you reproduce this with any data you can share?

dbousamra commented 2 years ago

Here is a code sandbox with a simple reproduction: https://codesandbox.io/s/recursing-booth-61xnc5?file=/src/index.ts

If you notice, the output is: JSON lengths { rawTags: 5149, naturalizedTags: 300153806 }. The naturalizedTags are being filled with many null values (presumably tags that aren't in my file, but dcmjs is eagerly appending).

If you switch the version back to pre 0.18.11, you will see output is: JSON lengths { rawTags: 5149, naturalizedTags: 3734 }. The null values aren't present.

I hope this helps you debug the issue. Please let me know if you need anything else.

pieper commented 2 years ago

Thanks for the code sandbox but I keep getting this 502 bad gateway error. Does this also happen for you? Is there another way to test the code?

image

pieper commented 2 years ago

If we have a simple node script that can reproduce and detect the error someone could try git bisect on it.

I'm not sure what change would lead to this issue, but I'm wonder if it's an unintended consequence of the accessors. @wayfarer3130 could you have a look?

https://github.com/dcmjs-org/dcmjs/compare/v0.18.1...v0.21.0#diff-a70070702108a92f492970cff5b5ef645d5c33c56c01a775d65063e9bdbefa7aR157

dbousamra commented 2 years ago

@pieper try just Ctrl-S/Cmd-S in the editor. It will force a save, and you should see the script run

dbousamra commented 2 years ago

I believe the issue was introduced in this changeset: https://github.com/dcmjs-org/dcmjs/compare/v0.18.10...v0.18.11

pieper commented 2 years ago

Thanks for narrowing this down @dbousamra, yes, that's the commit I was thinking might at the heart of it.

@wayfarer3130 should have a look. Or if anyone else has ideas, or wants to contribute a fix or a test to detect this case so that we'll know when it's fixed it will stay fixed that would be great.

wayfarer3130 commented 2 years ago

@pieper - I'm looking into this. The naturalized dataset will be somewhat larger than the original dataset because the data is accessible on multiple paths, but it shouldn't be this much larger. I will try to figure out what is going wrong.

github-actions[bot] commented 2 years ago

:tada: This issue has been resolved in version 0.22.1 :tada:

The release is available on:

Your semantic-release bot :package::rocket: