coreweave / tensorizer

Module, Model, and Tensor Serialization/Deserialization
MIT License
180 stars 25 forks source link

[3.0] Use fallocate in incremental mode #150

Open bchess opened 3 months ago

bchess commented 3 months ago

Regarding

        if not incremental:
            write_dependency = self._maybe_fallocate(write_specs)

Eta0 last week • As far as I can tell, this is the only time this flag is used, and it doesn't actually correspond to whether it is an incremental write or not, only whether it is from a call to write_tensor or not, so this flag could be renamed for clarity. That aside, for a large enough single tensor, it's still fine to call fallocate; it can proceed in the background during tensor and metadata preparation, including hash calculation.

@bchess bchess 2 days ago Yeah, but the logic inside _maybe_fallocate() uses its tensors list to calculate total size. To work in incremental mode it'd need to be updated to account for past writes.

Eta0 commented 3 months ago

Copying another of my comments from that thread (https://github.com/coreweave/tensorizer/pull/127#discussion_r1657800179):

fallocate extends the file relative to the current file position, so it doesn't need anything special to account for past writes if the file position is reliable; it can just progressively extend the file.