TritonDataCenter / node-manta

Node.js SDK for Manta
75 stars 54 forks source link

Switch muntar over to use tar-stream #290

Closed jperkin closed 1 year ago

jperkin commented 7 years ago

There are a number of issues with muntar at present that are caused by the use of the node-tar module.

One of the primary areas of concern is that there is no (or at least no easy) way to ensure concurrency of file entry events. This means that even when specifying muntar -p 1 to start up a single parallel worker, if the tar file contains lots of small files that fit into the available buffer space, many events can be generated before the worker gets a chance to pause the stream. Each completed upload then un-pauses the stream, exponentially increasing the number of TCP connections and leading to various failure modes.

The tar-stream module has a slightly different API, and provides the ability to control the incoming stream in greater depth, for example by delegating control over when the next file entry is read to the current entry. This allows us to correctly limit the concurrency to each worker.

In my testing moving over to tar-stream completely fixes the issue with a large tar file composed of many thousand small files, but due to the API change it will need to be extensively tested on a wide range of files before switching over.

bahamat commented 1 year ago

As of #396, muntar is deprecated and will be removed in a future release.