ddemidov / vexcl

VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP
http://vexcl.readthedocs.org
MIT License
702 stars 82 forks source link

How do you recommend deallocating device memory when using VexCL #66

Closed agerlach closed 11 years ago

agerlach commented 11 years ago

@ddemidov , you must be getting sick seeing all my questions here and on the odeint project. I am about 1 month away from finishing my dissertation and I truly appreciate all your help and patients.

As I previously mentioned, I am using odeint + vexcl to do parameters studies on ensembles of ODE's. The main portion of my code is in Matlab and I call odeint + vexcl through a mex interface. When I do a single call everything works perfectly, but I do encounter problems when I call the mex in a loop.

After around 8-9 mex calls, I find that no devices are available when creating a vex::Context. I check this with:

vex::Context ctx( vex::Filter::Type(CL_DEVICE_TYPE_GPU) );
if (!ctx) throw std::runtime_error("No devices available.");

By running

nvidia-smi

in the terminal I see that with each mex call I lose ~ 100MB in device RAM. Once I am out of RAM I encounter the failure mentioned. Interestingly, once I recompile the mex, the device RAM is freed.

So, how do you recommend freeing device memory allocated by

typedef vex::vector< float >    vector_type;
typedef vex::multivector< float, 5 > state_type;
...
vector_type input( ctx.queue() , input_host );
state_type X(ctx.queue(), n);
ddemidov commented 11 years ago

The memory associated with vex containers should in principle be deallocated when the containers are destroyed. Do you have to initialize vex::Context on every iteration? Could you initialize that only once? I have not worked with a mex interface, because I prefer to create a shared library and interact with it through a loadlibrary/calllib calls.

In case you absolutely have to repeatedly initialize vex::Context, you can try to call vex::purge_kernel_caches() before destroying the context. That could possibly help with the issue.

agerlach commented 11 years ago

Sorry for taking so long to respond to this. I got put on another project for the last week. vex::perge_kernel_caches() works.

I did not know about loadlibrary/calllib in Matlab until now. I'll look into that in the future. Thanks!

ddemidov commented 11 years ago

Ok, that means the memory was spent on creating new OpenCL contexts and caching compiled kernels. If your function looks like this:

vex::Context ctx(...);

// do the work

vex::purge_kernel_caches();

then you could try to make the instance of vex::Context static:

// Create context once:
static vex::Context ctx(...);

// do the work

// Keep cache of compiled kernels across function calls:
// vex::purge_kernel_caches();

This way you won't spend time on context initialization and recompilation of the same kernels over and over again.