alanshaw / stream-to-it

🚰 Convert Node.js streams to streaming iterables
Other
18 stars 6 forks source link

Memory leak in sink #6

Closed betamos closed 4 years ago

betamos commented 4 years ago

It looks like sink has a memory leak. Repro:

const fs = require('fs')
const toIterable = require('stream-to-it')
const pipe = require('it-pipe')

async function* run() {
  for (let i = 0; i < 170000; i++) {
    yield Buffer.alloc(4096)
  }
}

(async () => {
  const sink = toIterable.sink(fs.createWriteStream('/some/outfile'))
  await pipe(run(), sink)
  console.log('done')
  await new Promise(resolve => setTimeout(resolve, 1000000));
})()

Memory consumption steadily increases until completion and stays at around 800-900 Mb. Total size of those buffers are around 700 Mb. Early diagnostics with chrome webtools point towards the promises and/or the callback hijacking of stream.end().

Found when using the upstream libp2p-js with libp2p-js-tcp.

node v13.11.0 Linux 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

betamos commented 4 years ago

Played around a bit today and seems like the root cause is related to the V8 implementation (or even the spec according to the literals) of Promise.race. PTAL at streaming-iterables who worked around this very issue.

I confirmed that when substituting stream-to-it/sink with streaming-iterables/writeToStream, memory consumption was back to baseline (less than 20 Mb on my machine).

alanshaw commented 4 years ago

Thanks for the diagnosis @betamos - any ideas for how to fix?

betamos commented 4 years ago

I played around a bit with the code but didn't quite get it to work, so I ended up hot-swapping sink for streaming-iterables/writeToStream instead. I then used this in my libp2p stack and it didn't cause any problems so I called it a day.

I know that there are more features in this module than in writeToStream but I couldn't find much in terms of how they were actually supposed to work, so there was a good chance I'd break something upstream that depended on it.

I think the key is to create an internal source which yields both the messages from the original source and control messages (like error and finish), generated from plain old callbacks. It seems like one has to avoid Promise.race altogether under the current V8 implementation. I don't think it was misused or even used in an unusual fashion here.