There are a number of issues with muntar at present that are caused by the use of the node-tar module.
One of the primary areas of concern is that there is no (or at least no easy) way to ensure concurrency of file entry events. This means that even when specifying muntar -p 1 to start up a single parallel worker, if the tar file contains lots of small files that fit into the available buffer space, many events can be generated before the worker gets a chance to pause the stream. Each completed upload then un-pauses the stream, exponentially increasing the number of TCP connections and leading to various failure modes.
The tar-stream module has a slightly different API, and provides the ability to control the incoming stream in greater depth, for example by delegating control over when the next file entry is read to the current entry. This allows us to correctly limit the concurrency to each worker.
In my testing moving over to tar-stream completely fixes the issue with a large tar file composed of many thousand small files, but due to the API change it will need to be extensively tested on a wide range of files before switching over.
There are a number of issues with muntar at present that are caused by the use of the node-tar module.
One of the primary areas of concern is that there is no (or at least no easy) way to ensure concurrency of file entry events. This means that even when specifying
muntar -p 1
to start up a single parallel worker, if the tar file contains lots of small files that fit into the available buffer space, many events can be generated before the worker gets a chance to pause the stream. Each completed upload then un-pauses the stream, exponentially increasing the number of TCP connections and leading to various failure modes.The tar-stream module has a slightly different API, and provides the ability to control the incoming stream in greater depth, for example by delegating control over when the next file entry is read to the current entry. This allows us to correctly limit the concurrency to each worker.
In my testing moving over to tar-stream completely fixes the issue with a large tar file composed of many thousand small files, but due to the API change it will need to be extensively tested on a wide range of files before switching over.