Closed bathal1 closed 3 years ago
Hi @bathal1! Thanks for the report and repro. I can confirm the issue on Windows, and there is indeed a memory leak in destroying the GL context because CUDA-mapped graphics resources are apparently not freed when the context is destroyed.
Note that creating a GL context is a slow operation, and you should not do it at every call to rasterize()
but only once, and keep reusing the same context. However, if your use case requires frequent context creation/destruction, please try the following patch until we make a new release:
In nvdiffrast/torch/torch_rasterize.cpp
, add the following lines:
RasterizeGLStateWrapper::~RasterizeGLStateWrapper(void)
{
+ setGLContext(pState->glctx);
+ rasterizeReleaseBuffers(NVDR_CTX_PARAMS, *pState);
+ releaseGLContext();
destroyGLContext(pState->glctx);
delete pState;
}
In nvdiffrast/common/rasterize.h
, add the following line:
void rasterizeInitGLContext(NVDR_CTX_ARGS, RasterizeGLState& s, int cudaDeviceIdx);
void rasterizeResizeBuffers(NVDR_CTX_ARGS, RasterizeGLState& s, int posCount, int triCount, int width, int height, int depth);
void rasterizeRender(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t stream, const float* posPtr, int posCount, int vtxPerInstance, const int32_t* triPtr, int triCount, const int32_t* rangesPtr, int width, int height, int depth, int peeling_idx);
void rasterizeCopyResults(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t stream, float** outputPtr, int width, int height, int depth);
+ void rasterizeReleaseBuffers(NVDR_CTX_ARGS, RasterizeGLState& s);
Finally, in nvdiffrast/common/rasterize.cpp
, add the following function:
void rasterizeReleaseBuffers(NVDR_CTX_ARGS, RasterizeGLState& s)
{
int num_outputs = s.enableDB ? 2 : 1;
if (s.cudaPosBuffer)
{
NVDR_CHECK_CUDA_ERROR(cudaGraphicsUnregisterResource(s.cudaPosBuffer));
s.cudaPosBuffer = 0;
}
if (s.cudaTriBuffer)
{
NVDR_CHECK_CUDA_ERROR(cudaGraphicsUnregisterResource(s.cudaTriBuffer));
s.cudaTriBuffer = 0;
}
for (int i=0; i < num_outputs; i++)
{
if (s.cudaColorBuffer[i])
{
NVDR_CHECK_CUDA_ERROR(cudaGraphicsUnregisterResource(s.cudaColorBuffer[i]));
s.cudaColorBuffer[i] = 0;
}
}
if (s.cudaPrevOutBuffer)
{
NVDR_CHECK_CUDA_ERROR(cudaGraphicsUnregisterResource(s.cudaPrevOutBuffer));
s.cudaPrevOutBuffer = 0;
}
}
On my computer this leads to GPU memory usage remaining fixed over iterations of render_dummy()
as expected. Please let me know if you still experience problems — I have not tested this on Linux.
Hi @s-laine, thanks for your quick reply! I just tried the fix on my end (Ubuntu 20.04) and it works like a charm, thanks a lot!
Just to clarify: I do not create the GL context at every rendering call in my actual code, it's only created once per call of my "main" optimization function. The problem arises when I call this function several times in a row, e.g. when doing some parameter search.
Great to hear that this solved the problem for you. I'll keep this issue open until we have released a version that includes the fix.
Fix included in v0.2.6, closing.
Hello, When running several optimizations in a script I noticed that my GPU eventually runs out of memory, thus causing the script to fail. From looking at
nvidia-smi
after each run of an optimization, it seems that some memory is never freed (except when the process is killed of course).Here is a minimal reproducer:
Then, running
several times in a jupyter notebook, and checking
nvidia-smi
in between calls shows the growing memory used by the process.Alternatively, running
Should be enough to make the GPU run out of memory (I have a Titan RTX on my end).
The size of the leak seems to be proportional to the number of viewpoints or the resolution, which makes me suspect that the framebuffer is not properly freed (provided I'm not to blame here 😅). For example, with the resolution and viewpoints in the example above, the leak on my end is 1080MiB large, which is pretty close to the size of the result of
rasterize
.Also, here's the log output from running the dummy rendering function once with
dr.set_log_level(0)
:I initially noticed this behavior using nvidffrast v0.2.0, but I since updated to 0.2.5, which didn't change anything.