dcmjs-org / dcmjs

Javascript implementation of DICOM manipulation
https://dcmjs.netlify.com/
MIT License
292 stars 111 forks source link

more efficient buffer allocation #281

Closed vsaase closed 2 years ago

vsaase commented 2 years ago

I was able to speed up writing segmentations for large series (>1000 images) by a factor of 10 in Firefox with this fix The problem was too many buffer copies when the buffer was too small

The same strategy is used in Rust for vector capacity allocation: https://github.com/rust-lang/rust/blob/0ca7f74dbd23a3e8ec491cd3438f490a3ac22741/src/liballoc/raw_vec.rs#L397

JamesAPetts commented 2 years ago

Interesting magic!

I guess this trickles down to a low level optimisation? Could you write a comment about how this improves performance at least? I can see someone niavely reverting this thinking its more complicated/incorrect.

vsaase commented 2 years ago

The old implementation was to increase the size of the buffer only as much as needed when capacity is running out. This is a very costly operation, since memory allocation takes a lot of time.

The idea is that when it is necessary to increase the buffer size, there will likely be more bytes which need to be written to the buffer in the future. So we increase the buffer size right now by a larger amount than necessary, to reserve space for later writes which then can be done much faster. The current size of the buffer is the best estimate of the scale by which the size should increase. So approximately doubling the size of the buffer (while ensuring it fits the new data) is a simple but effective strategy.

JamesAPetts commented 2 years ago

Makes sense, one last consideration, upon serialisation does this affect the file size? We made need to truncate it to the actual length, unless that is dealt with somewhere.

vsaase commented 2 years ago

When the file is written: https://github.com/dcmjs-org/dcmjs/blob/e5667a342d3a54b799479c18bf0d706e1a754ce9/src/DicomDict.js#L46

the getBuffer function is invoked: https://github.com/dcmjs-org/dcmjs/blob/e5667a342d3a54b799479c18bf0d706e1a754ce9/src/BufferStream.js#L293

which returns a slice that is limited by this.size or a provided end.

So no, the actual byteLength of the buffer is not used downstream as the size is tracked by the size member variable.

JamesAPetts commented 2 years ago

Awesome, I thought as such, thanks for taking the time to check this.

Happy to merge this, thanks for the contribution.

cc @pieper .

pieper commented 2 years ago

Yes, makes sense - thanks @JamesAPetts and @vsaase 👍