Prior to this commit, allocations performed by ncclCommInitRank had no corresponding call to ncclCommDestroy. While ncclCommDestroy does occur in the CCLThreadLocalContext::Clear method, there are no calls into this method. On worker processes, the failure to call ncclCommDestroy typically had little effect. Any destruction would occur shortly before the process closes, and so resources would be reclaimed by the OS when the process terminates.
However, worker0 of a Disco session is a separate thread, rather than a separate process. While this allows it to easily receive data from the controller thread, resources allocated by worker0 are not reclaimed by the OS until the entire process terminates. As a result, the CCLThreadLocalContext leaked GPU memory, as the ncclCommInitRank call at the start of each
tvm.runtime.disco.ProcessSession was never de-allocated. The increase in GPU memory usage was about 1 gigabyte for each ProcessSession.
This commit updates CCLThreadLocalContext to have a destructor that calls the Clear method. For worker0, this is called when the thread is joined to the main thread.
Prior to this commit, allocations performed by
ncclCommInitRank
had no corresponding call toncclCommDestroy
. WhilencclCommDestroy
does occur in theCCLThreadLocalContext::Clear
method, there are no calls into this method. On worker processes, the failure to callncclCommDestroy
typically had little effect. Any destruction would occur shortly before the process closes, and so resources would be reclaimed by the OS when the process terminates.However, worker0 of a Disco session is a separate thread, rather than a separate process. While this allows it to easily receive data from the controller thread, resources allocated by worker0 are not reclaimed by the OS until the entire process terminates. As a result, the
CCLThreadLocalContext
leaked GPU memory, as thencclCommInitRank
call at the start of eachtvm.runtime.disco.ProcessSession
was never de-allocated. The increase in GPU memory usage was about 1 gigabyte for eachProcessSession
.This commit updates
CCLThreadLocalContext
to have a destructor that calls theClear
method. For worker0, this is called when the thread is joined to the main thread.