[3.0] [maybe] More granular future dependencies

From @Eta0 in https://github.com/coreweave/tensorizer/pull/127#pullrequestreview-2133569874

Dependencies for later tasks are tracked as a single tensor_data_task. We should add differentiation for write hazards and read hazards, since storing it as one item doesn't allow two operations that only need to read tensor data to happen simultaneously (e.g. computing hashes while also writing data to disk, when encryption isn't active). This is less harmful to performance when queueing operations in batch order like now rather than per-tensor, but any time that there are enough threads (or few enough tensors) to handle tasks from multiple stages at once, this would unblock the later stages sooner.

coreweave / tensorizer

[3.0] [maybe] More granular future dependencies #152