Multi-reader deserialization by default

This switches the default number of concurrent readers during deserialization from 1 to a dynamic value based on the file being deserialized, and the system resources.

The logic to choose how many readers to use by default is:

Anything that the code doesn't know how to reopen uses 1, otherwise,
Local files use 8, otherwise,
Anything using hash verification uses 4, otherwise,
Anything else uses 2
If a CURLStreamFile's headers do not include Accept-Ranges: bytes, use 1
Use fewer readers if the number picked is expected to overflow RAM
- E.g. if only the top 4 largest tensors are expected to fit in memory at one time, don't default to more than 4 readers
- This check is still bypassed if a specific num_readers is used
Never use more readers than there are tensors in the file

An extra check is also added that reopened files have a matching ETag to their original file, though it only emits a log statement if it fails (since ETags are not strictly guaranteed to be stable). This is because the initial reader could hypothetically fetch an older cached version of a resource when computing all the metadata, while subsequent readers get a newer version where their expected offsets are no longer valid. Since that would be a very confusing error if it comes up, a log statement could help figure it out.

This also pre-emptively raises an error if num_readers is specifically requested to be greater than 1 and the server does not give the Accept-Ranges: bytes header, where if num_readers is dynamic, it simply resets to 1.

coreweave / tensorizer

feat(serialization): Use a dynamic `num_readers` by default #122

Multi-reader deserialization by default