Open hughperkins opened 8 years ago
After commenting out a bunch of stuff around the place, fairly sure that there is a leak which is not in any of the following places (which doesnt mean these places dont contain an additional leak):
Edit1:
Seems like it's something to do with the map, in xgemm.cc, which increases in size, without limit:
Add to xgemm.cc, after line (*kernel_map)[key] = *clKernel;
printf("map size after put %i\n", kernel_map->size());
Result:
map size after put 1
map size after put 2
map size after put 3
map size after put 4
map size after put 5
map size after put 6
...
It's a static map, embedded in a function, so it will be challenging to figure out a workable solution I think?
I think one way to handle this, without having to think about threading, and contexts and stuff is, we have like a 'setup version number', which is incremented each time we call setup
(or maybe teardown
, but comes to the same thing). In makeGemmKernel
, it has a static int with the last seen setup version number. If it doesnt match the current setup version number, then makeGemmKernel .... oh... hmmm. its too late to release those kernels actually... :-(
Per http://stackoverflow.com/questions/15067160/stdmap-thread-safety/15067564#15067564 , you can read/write a map from different threads, as long as you dont access the same items, or iterate. Sooo... pondering:
(Not sure if this would work on Windows easily though :-( Seems like this is non-obvious to do http://stackoverflow.com/questions/1679243/getting-the-thread-id-from-a-thread )
@hughperkins the map needs to be thread local static variable. Otherwise you'll end up with a bug or compile the kernels repeatedly. I think the better option would be to clear the map from teardown.
OK I see that's exactly what you suggested. I don't know how thread Id works with non C++11 threads (for example if someone is using openmp instead of c++11 threads).
But I have to ask, in what situation would you really need to call setup and teardown repeatedly.. You would only ever need to call them once in any program.
Well, in my unit tests for example. Since I need to test kernel creation, caching etc, I need to tear down the opencl context at the end of each test, and create a new one for the new one. Not doing this would workaround the issue, but mean my tests miss a whole bunch of issues in my own code.
How to clear the map from teardown? Since the map is thread-local, seems that the teardown wont have access to other thread's maps? Or, you mean, each thread should be calling teardown?
@hughperkins I updated my earlier comment. This is an interesting problem. An alternative to using thread id would be to register the map during construction in a mutex locked variable (like an std::vector of pointers to the map). When teardown is called, you can just go through the registry and delete all of them.
Yes, mutex-looked variable sounds correct. Since it sounds like a bunch of work, and socialization and so on, and since my own use-cases is single-threaded for now, I think I'm just going to create my own fork, move the map to being non-threadlocal, and clear it in teardown. If I ever have to think about threading, I'll put a mutex around it, and create a PR into develop branch.
@hughperkins no reason to fork. I can send in a patch that'll fix this in < 1 hour. Try to be as close to upstream as possible :)
Oh, nice! :-)
@hughperkins me sending a patch doesnt mean it will be merged immediately. You can test the patch though.
Yes, understood :-)
So, I actually created a non-mutexed, single-threaded teardown for xgemm map https://github.com/clMathLibraries/clBLAS/compare/develop...hughperkins:xgemm_teardown?expand=1 However, that doesnt stop the leak. I havent measured the exact leak rate, and I assume it's gone down a bit, but it's certainly not zero yet. Perhaps I need to measure the leak rate...
So, I took some measurements of memory usage. Turns out that the memory leak is about 70MB per setup/sgemm/teardown cycle. The sucky thing is that this is true with or without xgemm map teardown :-(
(Note: seems I had the order ot releasequeue and releasecontext in the wrong order earlier; corrected; but issue persists) Inspecting memory usage step by step. Without gemm, memory usage changes approixmatley like:
So, the memory change is all associated with setting up / tearing down context. The strange thing is, after calling gemm, in this version of clblas, releasing the context no longer releases the memory:
(Edit: seems like clGetContextInfo with REFERNCE_COUNT will be useful https://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/clGetContextInfo.html )
(Edit2: reference count seems to be always reported as 1. I wonder if it's a NOP on certain GPUs?)
Edit3: so I guess it's like, per https://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/clReleaseContext.html, the context is either entirely deleted, or not deleted at al, all or nothing. So, if there are any references left over, it wont be deleted. releasing the kernel is essential, and there's probably some other object(s) somewhere not being deleted either...
Edit4: oops, missing a clReleaseEvent in my code above :-P Added in.
Yay, leak gone :-) I had to do the following things to remove it:
To reproduce, run:
Monitor memory usage whilst running, and ctrl-c out within ~5-10seconds, to prevent computer freezing.