boostorg / compute

A C++ GPU Computing Library for OpenCL
http://boostorg.github.io/compute/
Boost Software License 1.0
1.55k stars 333 forks source link

Memory leak #746

Closed wincrash closed 6 years ago

wincrash commented 6 years ago

Hello, I have found that using sort or sort_by_key command multiple times i get memory leak (memory usage is growing). In follow example i get the memory leak, if i just comment the line "boost::compute::sort" memory leak disappears. Why that happening? Maybe i doing something wrong?

#include <iostream>
#include <boost/compute/core.hpp>
#include <boost/compute.hpp>
namespace compute = boost::compute;
int main()
{
    auto device = boost::compute::system::devices().at(0);
    auto context = boost::compute::context(device);
    auto queue = boost::compute::command_queue(context, device);
    std::cout << device.name() << "\n";

    int count=1000000;
    std::vector<float> a;
    std::vector<float> b;
    std::vector<float> c;
    a.resize(count);
    b.resize(count);
    c.resize(count);

    for(int i=0;i<count;i++){
        a[i]=i*2+1;
        b[i]=(i+1)*2;
        c[i]=0;
    }
    compute::vector<float> buffer_a( context);
    compute::vector<float> buffer_b( context);
    compute::vector<float> buffer_c(context);
    buffer_a.resize(count,queue);
    buffer_b.resize(count,queue);
    buffer_c.resize(count,queue);

    boost::compute::copy(a.begin(),a.end(),buffer_a.begin(),queue);
    boost::compute::copy(b.begin(),b.end(),buffer_b.begin(),queue);
    boost::compute::copy(c.begin(),c.end(),buffer_c.begin(),queue);

    // source code for the add kernel
    const char source[] =
            "__kernel void add(__global const float *a,"
            "                  __global const float *b,"
            "                  __global float *c)"
            "{"
            "    const uint i = get_global_id(0);"
            "    c[i] = a[i] + b[i];"
            "}";

    // create the program with the source
    compute::program program =
            compute::program::create_with_source(source, context);

    // compile the program
    program.build();

    // create the kernel
    compute::kernel kernel(program, "add");

    // set the kernel arguments
    kernel.set_arg(0, buffer_a);
    kernel.set_arg(1, buffer_b);
    kernel.set_arg(2, buffer_c);

    // run the add kernel
    for(int i=0;i<100000;i++){
        queue.enqueue_1d_range_kernel(kernel, 0, count, 0);
        boost::compute::sort(buffer_c.begin(),buffer_c.end(),queue);
        queue.finish();
    }
    return 0;
}
illuhad commented 6 years ago

Hi,

this issue is currently the major showstopper for my application, so I did some investigating. It turns out that the culprit is not sort() but exclusive_scan() which is called by boost::compute's radix sort implementation. In other words, replace your sort() calls with calls to exclusive_scan() and you will see the same behavior.

I could further trace the problem to detail::scan_impl(). Curiously, the problem does not affect inclusive_scan, which also makes use of scan_impl(), and scan_impl() also works fine if the exclusive boolean parameter is set to false.

The weird thing is that scan_impl() itself doesn't even seem to depend on the exclusive boolean parameter, it is simply forwarded to local_scan_kernel. However, as a meta kernel object, local_scan_kernel just builds kernel source code which is slightly different if exclusive==true. How could this leak memory? Or am I on the wrong track? I'm not very familiar with internals of boost::compute, so it could well be that I've missed something.

Update: I've found that if I change the line 113 in scan_on_gpu.hpp (this is in the local_scan_kernel) from op(first[expr<cl_uint>("gid")], var<T>("scratch[lid]")) to first[expr<cl_uint>("gid")] <<"+"<<var<T>("scratch[lid]")) the problem disappears. It's still unclear to me how the call to the binary operator could cause such a massive memory leak. Since the call to op() with two var<T>() arguments a few lines above doesn't cause any issues, I suspect it might be the combination of op() with first[expr...]?

illuhad commented 6 years ago

It is indeed the combination of op() and first[expr...]. Replacing the code starting at line 111 of scan_on_gpu.hpp with

"if(lid == block_size - 1){\n" <<
"    " << decl<T>("temp") << " = " << first[expr<cl_uint>("gid")] <<";\n"<<
"    block_sums[get_group_id(0)] = " <<''
               op(var<T>("temp"), var<T>("scratch[lid]")) <<
                ";\n" <<
"}\n";

gets rid of the memory leak. This could be a possible workaround until the root cause is found.

ghavasi commented 6 years ago

The compute::reduce() function also leaks (except if the binary function is compute::plus<> ) See the following example:

BOOST_AUTO_TEST_CASE(reduce_twos)
{
    using compute::uint_;

    compute::vector<uint_> vector(8, context);
    const compute::buffer& buff = vector.get_buffer();
    auto rc1 = buff.get_info<CL_MEM_REFERENCE_COUNT>();
    {
        compute::fill(vector.begin(), vector.end(), uint_(2), queue);

        uint_ product;
        compute::reduce(vector.begin(), vector.end(), &product, compute::multiplies<uint_>(), queue);
        BOOST_CHECK_EQUAL(product, uint_(256));
    }
    auto rc2 = buff.get_info<CL_MEM_REFERENCE_COUNT>();
    BOOST_CHECK_EQUAL(rc1, rc2);
}

The two reference count should be equal. The problem comes from the instruction

size_t detail::reduce(InputIterator first, size_t count, OutputIterator result,
              size_t block_size, BinaryFunction function, command_queue &queue)
...
    k << 
        function(first[k.make_var<uint_>("gid*2+0")],
                        first[k.make_var<uint_>("gid*2+1")]) << ";\n";

IMHO the

    ~buffer_iterator_index_expr()
    {
        // set buffer to null so that its reference count will
        // not be decremented when its destructor is called
        m_buffer.get() = 0;
    }

leaves the extra buffer reference.

antibes0415 commented 6 years ago

@jszuppe Hi. I used exclusive_scan and reduce function. My program is also faced with memory leak. Could you check and modify the code ?

jszuppe commented 6 years ago

IMHO the

~buffer_iterator_index_expr()
{
// set buffer to null so that its reference count will
// not be decremented when its destructor is called
m_buffer.get() = 0;
}

leaves the extra buffer reference.

It should not leave extra buffer reference since in the constructor:

    buffer_iterator_index_expr(const buffer &buffer,
                               size_t index,
                               const memory_object::address_space address_space,
                               const IndexExpr &expr)
        : m_buffer(buffer.get(), false),
         ...

it does not increase the reference counter of buffer. However, I'll check it.

jszuppe commented 6 years ago

It seems that default copy ctor is causing this problem. I'll check if that [adding it] fixes the problem in other reported cases.

jszuppe commented 6 years ago

It seems that it fixes all reported issues reported in this thread.

jszuppe commented 6 years ago

Should be fixed by #780. Reopen if you still get this problem on develop branch.

antibes0415 commented 6 years ago

Thanks. Seems memory leak issue solved.