How does "speculative page-out" work?

Hi there, I really like this LMS, a real lifesaver for large model training 👍

However, I want to understand how "speculative page-out" works under the hood, especially how LMS decides whether a memory block should be paged out speculatively, and what's the intuition behind this decision. I believe the answer lies in this function (https://github.com/mtbrandy/pytorch/blob/7f5342e873d39cdcb6978facb258cf730c309bcf/c10/cuda/CUDACachingAllocator.cpp#L291), but it is really hard to understand. Any help would be appreciated : )

To my understanding of the code, "speculative page-out" works like this -- during LmsStorageImpl::unpin() execution, LMS can speculatively page-out (cudaMemcpyAsyc) this "unpinned/inactive" memory block from GPU to CPU, such that the latter reclaim execution (DeviceCachingAllocator::reclaim_block()) can happily skip this page-out execution. But how to decide (predict() in terms of the code) whether this unpinned memory block should be paged out speculatively is still beyond me.

Thank you!

IBM / pytorch-large-model-support

How does "speculative page-out" work? #5