coreweave / tensorizer

Module, Model, and Tensor Serialization/Deserialization
MIT License
153 stars 24 forks source link

[3.0] Optional syncing #153

Open bchess opened 2 weeks ago

bchess commented 2 weeks ago

From @Eta0 in https://github.com/coreweave/tensorizer/pull/127#pullrequestreview-2133569874

We could add a synchronize=False mode (referring to syncing data to disk) for incremental writes that defers writing headers until a write occurs with synchronize=True or the file is finalized. For cases like uploading to a temporary file before uploading to S3, writing headers can always be deferred, and this could be better for performance whenever constant synchronization isn't needed since it can guarantee a single write for that entire section, and (potentially) allow less fiddling with encryption information in the header section from having to repeatedly edit data in it (like MACs).