coreweave / tensorizer

Module, Model, and Tensor Serialization/Deserialization
MIT License
153 stars 24 forks source link

[3.0] [maybe] More granular future dependencies #152

Open bchess opened 2 weeks ago

bchess commented 2 weeks ago

From @Eta0 in https://github.com/coreweave/tensorizer/pull/127#pullrequestreview-2133569874

Dependencies for later tasks are tracked as a single tensor_data_task. We should add differentiation for write hazards and read hazards, since storing it as one item doesn't allow two operations that only need to read tensor data to happen simultaneously (e.g. computing hashes while also writing data to disk, when encryption isn't active). This is less harmful to performance when queueing operations in batch order like now rather than per-tensor, but any time that there are enough threads (or few enough tensors) to handle tasks from multiple stages at once, this would unblock the later stages sooner.