Closed wincrash closed 6 years ago
Hi,
this issue is currently the major showstopper for my application, so I did some investigating. It turns out that the culprit is not sort() but exclusive_scan() which is called by boost::compute's radix sort implementation. In other words, replace your sort() calls with calls to exclusive_scan() and you will see the same behavior.
I could further trace the problem to detail::scan_impl(). Curiously, the problem does not affect inclusive_scan, which also makes use of scan_impl(), and scan_impl() also works fine if the exclusive boolean parameter is set to false.
The weird thing is that scan_impl() itself doesn't even seem to depend on the exclusive boolean parameter, it is simply forwarded to local_scan_kernel. However, as a meta kernel object, local_scan_kernel just builds kernel source code which is slightly different if exclusive==true. How could this leak memory? Or am I on the wrong track? I'm not very familiar with internals of boost::compute, so it could well be that I've missed something.
Update: I've found that if I change the line 113 in scan_on_gpu.hpp (this is in the local_scan_kernel) from
op(first[expr<cl_uint>("gid")], var<T>("scratch[lid]"))
to
first[expr<cl_uint>("gid")] <<"+"<<var<T>("scratch[lid]"))
the problem disappears. It's still unclear to me how the call to the binary operator could cause such a massive memory leak. Since the call to op()
with two var<T>()
arguments a few lines above doesn't cause any issues, I suspect it might be the combination of op()
with first[expr...]
?
It is indeed the combination of op()
and first[expr...]
. Replacing the code starting at line 111 of scan_on_gpu.hpp with
"if(lid == block_size - 1){\n" <<
" " << decl<T>("temp") << " = " << first[expr<cl_uint>("gid")] <<";\n"<<
" block_sums[get_group_id(0)] = " <<''
op(var<T>("temp"), var<T>("scratch[lid]")) <<
";\n" <<
"}\n";
gets rid of the memory leak. This could be a possible workaround until the root cause is found.
The compute::reduce() function also leaks (except if the binary function is compute::plus<> ) See the following example:
BOOST_AUTO_TEST_CASE(reduce_twos)
{
using compute::uint_;
compute::vector<uint_> vector(8, context);
const compute::buffer& buff = vector.get_buffer();
auto rc1 = buff.get_info<CL_MEM_REFERENCE_COUNT>();
{
compute::fill(vector.begin(), vector.end(), uint_(2), queue);
uint_ product;
compute::reduce(vector.begin(), vector.end(), &product, compute::multiplies<uint_>(), queue);
BOOST_CHECK_EQUAL(product, uint_(256));
}
auto rc2 = buff.get_info<CL_MEM_REFERENCE_COUNT>();
BOOST_CHECK_EQUAL(rc1, rc2);
}
The two reference count should be equal. The problem comes from the instruction
size_t detail::reduce(InputIterator first, size_t count, OutputIterator result,
size_t block_size, BinaryFunction function, command_queue &queue)
...
k <<
function(first[k.make_var<uint_>("gid*2+0")],
first[k.make_var<uint_>("gid*2+1")]) << ";\n";
IMHO the
~buffer_iterator_index_expr()
{
// set buffer to null so that its reference count will
// not be decremented when its destructor is called
m_buffer.get() = 0;
}
leaves the extra buffer reference.
@jszuppe Hi. I used exclusive_scan and reduce function. My program is also faced with memory leak. Could you check and modify the code ?
IMHO the
~buffer_iterator_index_expr() { // set buffer to null so that its reference count will // not be decremented when its destructor is called m_buffer.get() = 0; }
leaves the extra buffer reference.
It should not leave extra buffer reference since in the constructor:
buffer_iterator_index_expr(const buffer &buffer,
size_t index,
const memory_object::address_space address_space,
const IndexExpr &expr)
: m_buffer(buffer.get(), false),
...
it does not increase the reference counter of buffer. However, I'll check it.
It seems that default copy ctor is causing this problem. I'll check if that [adding it] fixes the problem in other reported cases.
It seems that it fixes all reported issues reported in this thread.
Should be fixed by #780. Reopen if you still get this problem on develop
branch.
Thanks. Seems memory leak issue solved.
Hello, I have found that using sort or sort_by_key command multiple times i get memory leak (memory usage is growing). In follow example i get the memory leak, if i just comment the line "boost::compute::sort" memory leak disappears. Why that happening? Maybe i doing something wrong?