Is there plan for more CUDA context api?

windforce7 commented 6 years ago

This is a dlib python api issue as well as a cuda one. I have a python application doing face image recognition and processing. CNN face detection model of dlib is used and I have no problem compiling dlib with cuda support. The project runs well when the input image is of modest size, otherwise RuntimeError: cudaMalloc() out of memory is raised and terminates the process. A feature we need to know is that a cuda context is created for any call of cuda api (this description may be inaccurate. I'm professional in cuda programming). Some run time error may "corrupt" the cuda context and therefore any calls afterward would fail. Some one discussed about this two years ago on StackOverflow: https://stackoverflow.com/questions/30909368/once-cudamalloc-returns-out-of-memory-every-cuda-api-call-returns-failure A function is provided to reset the context to runnable state: cudaDeviceReset(). For the C++ programmers, this case seems to be closed. OOM error can be caught while calling dlib python api. Naturally I want to recover the context state after this. At first, I tried to set all reference of dlib resources to None and reload the dlib module, which didn't work. Then I wrote a cuda program, something like this:

//release all references to cuda resources before running this, otherwise you'll get segment fault.
class CudaCtx {
  public:
  void reset_ctx(int);
};
void CudaCtx::reset_ctx(int gpu_no) {
  cudaSetDevice(gpu_no);
  cudaError_t error = cudaGetLastError();
  if (error != cudaSuccess) {
    //expected branch, this would hopefully get the error code in the dlib process
    std::cout << cudaGetErrorString(error) << std::endl;
  }
 else {
    //what I actually get
    std::cout << "nothing to reset" << std::endl;
  }
  //cleanup allocated gpu memory regardless.
  cudaDeviceReset();
}
extern "C" {
  CudaCtx cc;
  void reset_ctx(int gpu_no) {
    cc.reset_ctx(gpu_no);
  }
}

I compiled it with nvcc and called it in Python project. The cudaDeviceReset() function did clean up all memory allocation by the python process. However, the error state caused by dlib in the same process is not caught (and the main python process continues to say OOM in cudaGetLastError() after calling it). This indicates that the dlib functions and the cleanup program are in different cuda context. However, the cleanup program did managed to clean up something that does not belong to it. (monitor GPU memory usage in nvidia-smi).

I couldn't find any dlib api on cuda context resetting, so there's nothing I can do to recover the ctx state other than killing the whole process. Is there any plan for such api? Any suggestions are also very much appreciated here. Thank you.

davisking commented 6 years ago

Yes, there are some cuda related context objects. For example, https://github.com/davisking/dlib/blob/master/dlib/cuda/cublas_dlibapi.cpp#L89. They all have that same thread_local construction. You are welcome to submit a PR that adds a method that resets these cuda context objects. However, in general I think the right thing to do is to avoid running out of memory in the first place by doing appropriate input validation and not running images that would lead to memory exhaustion.

windforce7 commented 6 years ago

Thank you @davisking. I'll see what I can do, but not very soon.

davisking / dlib

Is there plan for more CUDA context api? #1504