coreweave / tensorizer

Module, Model, and Tensor Serialization/Deserialization
MIT License
180 stars 25 forks source link

feat(serialization): Use a dynamic `num_readers` by default #122

Closed Eta0 closed 5 months ago

Eta0 commented 5 months ago

Multi-reader deserialization by default

This switches the default number of concurrent readers during deserialization from 1 to a dynamic value based on the file being deserialized, and the system resources.

The logic to choose how many readers to use by default is:

An extra check is also added that reopened files have a matching ETag to their original file, though it only emits a log statement if it fails (since ETags are not strictly guaranteed to be stable). This is because the initial reader could hypothetically fetch an older cached version of a resource when computing all the metadata, while subsequent readers get a newer version where their expected offsets are no longer valid. Since that would be a very confusing error if it comes up, a log statement could help figure it out.

This also pre-emptively raises an error if num_readers is specifically requested to be greater than 1 and the server does not give the Accept-Ranges: bytes header, where if num_readers is dynamic, it simply resets to 1.