coreweave / tensorizer

Module, Model, and Tensor Serialization/Deserialization
MIT License
180 stars 25 forks source link

(feat) add `buffer_size ` argument to `CURLStreamFile`, adjust to `2mb`, add `benchmark.py` #49

Closed wbrown closed 1 year ago

wbrown commented 1 year ago

This adds the buffer_size argument to CURLStreamFile to allow us to experiment with and adjust the Popen buffer. Getting the right buffer size appears to impact the speed of transfers. Too large a buffer means that it gets filled, and then the TCP socket exerts backpressure for a whle. Too small means that we're spending time waiting on the buffers to be repopulated.

In this spirit, benchmark.py was written to measure the impacts of read sizes and buffer sizes. Preliminary results imply that 2mbis the most optimal at this time, so the default Popen buffer size has been set to 2mb.