eth-easl / orion

An interference-aware scheduler for fine-grained GPU sharing
MIT License
97 stars 15 forks source link

Allow different streams to share cached memory #5

Open XianzheMa opened 1 year ago

XianzheMa commented 1 year ago

Pytorch itself does not allow one stream to use cached free memory of another stream. This poses problems for efficient/flexible memory usage when GPU is shared among multiple workloads. We need to address this issue.

XianzheMa commented 1 year ago

Source: https://github.com/pytorch/pytorch/blob/91cce4c09aef38b1cb5be5d4a9d55845aeff02b3/c10/cuda/CUDACachingAllocator.cpp#L39

Relevant discussion: https://discuss.pytorch.org/t/why-gpu-memory-allocations-are-associated-with-the-cuda-stream/63122