coin-or / CppAD

A C++ Algorithmic Differentiation Package: Home Page
https://cppad.readthedocs.io
Other
446 stars 94 forks source link

CppAD exception while running parallel loop #197

Closed juanmanzanero closed 3 months ago

juanmanzanero commented 4 months ago

Good morning,

I am running a very simple multithread example, which consists of the following steps:

1) Create an ADFun (and arbitrary dimension of 100 -> 100) 2) Duplicate this ADFun N_THREADS times and store it into a vector 3) Run several loops where we compute the Jacobian of the ADFuns, using for each thread their individual ADFun


#include <cppad/cppad.hpp>

void main()
{
    using AD = CppAD::AD<double>;

    const auto num_points = 100u;
    std::vector<AD> a_x(num_points); 

    CppAD::Independent(a_x);

    std::vector<AD> a_y(num_points);

    for (auto p = 0u; p < num_points; ++p) {

        a_y.at(p) = double{ 1 };

        for (auto j = 0u; j < p; ++j) {
            a_y.at(p) *= a_x.at(j);
        }
    }

    CppAD::ADFun<double> a_f(a_x, a_y);
    a_f.optimize();

    const auto omp_get_max_threads_ = omp_get_max_threads();

    std::vector<CppAD::ADFun<double>> v_funs(omp_get_max_threads_);

    std::fill(v_funs.begin(), v_funs.end(), a_f);

    for (auto loop = 0u; loop < 100u; ++loop) {

        std::cout << "loop: " << loop << std::endl;

#pragma omp parallel for
        for (size_t p = 1u; p < num_points; ++p) {
            const auto omp_get_thread_num_ = omp_get_thread_num();

            std::vector<double> x0(num_points, double{ 0 });

            v_funs[omp_get_thread_num_].Jacobian(x0);
        }

    }

The execution of the example results in the following CPPAD error:

cppad-20231008 error from unknown source
Error detected by false result for
    result >= info->count_inuse_
at line 264 in the file
    /home/juan/software/lioncpp-install/include/lion/thirdparty/include/cppad/utility/thread_alloc.hpp
math_test: /home/juan/software/lioncpp-install/include/lion/thirdparty/include/cppad/utility/error_handler.hpp:206: static void CppAD::ErrorHandler::Default(bool, int, const char*, const char*, const char*): Assertion `false' failed.
Aborted (core dumped)

and this error is raised after performing an arbitrary number of loops, in different program executions.

Running with valgrind does not raise the error, but its memory stats differ from execution to execution:

bradbell commented 4 months ago

CppAD can work with any threading system. In order to accomplish this, it uses an abstract interface for the threading. For an example of connecting openmp to this abstract interface; see https://cppad.readthedocs.io/latest/multi_thread.html

For a definition of the abstract interface; see https://cppad.readthedocs.io/latest/simple_ad_openmp.cpp.html

bradbell commented 4 months ago

Here is an example of setting up your calculation above:


#include <cppad/cppad.hpp>

namespace {
   bool in_parallel(void)
   {  return omp_in_parallel() != 0; }
   size_t thread_number(void)
   {  return static_cast<size_t>( omp_get_thread_num() ); }
}
int main()
{
    using AD = CppAD::AD<double>;
    const auto num_points = 100u;
    std::vector<AD> a_x(num_points);

    // setup for using CppAD in parallel
    assert( ! in_parallel() );
    const int int_num_threads = omp_get_max_threads();
    size_t num_threads        = size_t(int_num_threads);
    CppAD::thread_alloc::parallel_setup(
       num_threads, in_parallel, thread_number
    );
    CppAD::thread_alloc::hold_memory(true);
    CppAD::parallel_ad<double>();

    // parallel_ad does not make this call and perhaps it should
    // (it setups up CppAD::vector but not std::vector).
    CppAD::CheckSimpleVector<double, std::vector<double> >();

    CppAD::Independent(a_x);
    std::vector<AD> a_y(num_points);

    for (auto p = 0u; p < num_points; ++p) {
        a_y.at(p) = double{ 1 };
        for (auto j = 0u; j < p; ++j) {
            a_y.at(p) *= a_x.at(j);
        }
    }
    CppAD::ADFun<double> a_f(a_x, a_y);
    a_f.optimize();

    std::vector<CppAD::ADFun<double>> v_funs(num_threads);
    std::fill(v_funs.begin(), v_funs.end(), a_f);

    for (auto loop = 0u; loop < 100u; ++loop) {
        std::cout << "loop: " << loop << std::endl;
#pragma omp parallel for
        for (size_t p = 1u; p < num_points; ++p) {
            const auto omp_get_thread_num_ = omp_get_thread_num();
            std::vector<double> x0(num_points, double{ 0 });
            v_funs[omp_get_thread_num_].Jacobian(x0);
        }
    }
}
bradbell commented 4 months ago

@juanmanzanero If you are happy with the explanation above, please code this issue

juanmanzanero commented 4 months ago

Hello Brad,

Thank you very much for the very detailed response. Indeed I have tried your edits and I was successful to run the program.

Then, I run into a different problem: I have changed the example to SparseJac instead. After the parallel_ad, I added the following lines

CppAD::CheckSimpleVector<CppAD::AD<double>, std::vector<CppAD::AD<double> > >();
    CppAD::CheckSimpleVector<bool, CppAD::vectorBool>();
    CppAD::CheckSimpleVector<std::size_t, CppAD::vector<std::size_t> >();

and get the following error

cppad-20231224 error from a known source:
Attempt to return memory for a different thread while in parallel mode
Error detected by false result for
thread == thread_num() || (! in_parallel())
at line 938 in the file
../cppad/cppad-master/include/cppad/utility/thread_alloc.hpp
cppad-20231224 error from a known source:
Attempt to return memory for a different thread while in parallel mode
Error detected by false result for
thread == thread_num() || (! in_parallel())
at line 938 in the file
../cppad/cppad-master/include/cppad/utility/thread_alloc.hpp

I could solve the issue by defining the ADFun from ADFun a_f(a_x, a_y) to

CppAD::ADFun<double> a_f;
a_f.Dependent(a_x, a_y);
a_f.optimize();

This seems to solve the issue, I do not know if this is an expected behaviour, but I would like to know if so.

Thank you very much for your time.

Juan.

bradbell commented 4 months ago

The problem above seems of a more complex nature. Perhaps you could help track it down. If you would like to help, please try running the program in a debugger (like gdb) and finding out what the values of thread and thread_num() are ? Also, going up the stack frame, perhaps you can figure out what memory object is being created by one thread and deleted by another ?

It would be helpful to know if the problem happens when you change CppAD::thread_alloc::hold_memory(true); to CppAD::thread_alloc::hold_memory(false);

diegolodares commented 4 months ago

Hi @bradbell and @juanmanzanero,

I've been following this issue, and get the same error that Juan describes in the SparseJacobian example, both in Linux and Windows. I've used Visual Studio's debugger to see that the error is raised by function thread_alloc::return_memory, called by local::pod_vector_maybe's destructor, invoked on return by ADFun::capacity_order:

main -> SparseJacobian -> SparseJacobianFor -> Forward -> capacity_order

template <class Base, class RecBase>
void ADFun<Base,RecBase>::capacity_order(size_t c, size_t r)
{
   ...

   // replace taylor_ by new_taylor
   taylor_.swap(new_taylor);
   cap_order_taylor_     = c;
   num_order_taylor_     = p;
   num_direction_taylor_ = r;

   // note that the destructor for new_taylor will free the old taylor memory -> THE ERROR HAPPENS HERE
   return;
}

In thread_alloc::return_memory, thread_num() always returns the correct value, but local variable size_t thread = tc_index / num_cap; evaluates to 0 every time the error appears, as if new_taylor's data_ was corrupt or something like that. I've also seen that setting CppAD::thread_alloc::hold_memory(false) doesn't affect this behaviour.

Regards, and many thanks for this excellent project,

Diego

bradbell commented 4 months ago

@diegolodares Please provide an example that is as simple as possible and that leads to this assertion.

diegolodares commented 4 months ago
#include <cppad/cppad.hpp>

namespace {
   bool in_parallel(void)
   {  return omp_in_parallel() != 0; }
   size_t thread_number(void)
   {  return static_cast<size_t>( omp_get_thread_num() ); }
}
int main()
{
    using AD = CppAD::AD<double>;
    const auto num_points = 100u;
    std::vector<AD> a_x(num_points);

    // setup for using CppAD in parallel
    assert( ! in_parallel() );
    const int int_num_threads = omp_get_max_threads();
    size_t num_threads        = size_t(int_num_threads);
    CppAD::thread_alloc::parallel_setup(
       num_threads, in_parallel, thread_number
    );
    CppAD::thread_alloc::hold_memory(false);
    CppAD::parallel_ad<double>();

    // parallel_ad does not make this call and perhaps it should
    // (it setups up CppAD::vector but not std::vector).
    CppAD::CheckSimpleVector<double, std::vector<double> >();

    // I add these ones to call "SparseJacobian" in parallel
    CppAD::CheckSimpleVector<CppAD::AD<double>, std::vector<CppAD::AD<double> > >();
    CppAD::CheckSimpleVector<bool, CppAD::vectorBool>();
    CppAD::CheckSimpleVector<std::size_t, CppAD::vector<std::size_t> >();

    CppAD::Independent(a_x);
    std::vector<AD> a_y(num_points);

    for (auto p = 0u; p < num_points; ++p) {
        a_y.at(p) = double{ 1 };
        for (auto j = 0u; j < p; ++j) {
            a_y.at(p) *= a_x.at(j);
        }
    }

    // build the ADFun, with the alternative
    // "CppAD::ADFun<double> a_f; a_f.Dependent(a_x, a_y);"
    // there seems to be no problem
    CppAD::ADFun<double> a_f(a_x, a_y);

    a_f.optimize();

    std::vector<CppAD::ADFun<double>> v_funs(num_threads);
    std::fill(v_funs.begin(), v_funs.end(), a_f);

    for (auto loop = 0u; loop < 100u; ++loop) {
        std::cout << "loop: " << loop << std::endl;
#pragma omp parallel for
        for (int p = 1u; p < num_points; ++p) {
            const auto omp_get_thread_num_ = omp_get_thread_num();
            std::vector<double> x0(num_points, double{ 0 });
            v_funs[omp_get_thread_num_].SparseJacobian(x0);
        }
    }
}
bradbell commented 4 months ago

I have verified that I get the error with the current development version on my machine; i.e.,

cppad-20240306 error from a known source:
Attempt to return memory for a different thread while in parallel mode
Error detected by false result for
    thread == thread_num() || (! in_parallel())
at line 941 in the file 
    /home/bradbell/repo/cppad.git/include/cppad/utility/thread_alloc.hpp
...

I am working on some other things, but I will try to get to this soon.

bradbell commented 4 months ago

I have figured out what is going on, but I am not sure what we should do about this ? There are a number of problems brought up by this example that need improved documentation and perhaps improvements to the code:

When you create an ADFun with the constructor ADFun a_f(a_x, a_y) , a call to zero order foward mode is done; see Sequence Constructor on https://cppad.readthedocs.io/latest/fun_construct.html#sequence-constructor This store the corresponding Taylor coefficients in a_f.

In the function assignments in

std::fill(v_funs.begin(), v_funs.end(), f);

It allocates Taylor coefficient memory for each of the functions using thread zero. But inside the loop

for (size_t p = 0; p < num_threads; ++p) 

It tries to free that memory. If you do the following after the ADFun a_f(ax, ay) constructor, it will fix this problem

a_f.capacity_order(0)
bradbell commented 4 months ago

See the heading 03-09 on https://cppad.readthedocs.io/latest/2024.html#mm-dd-03-09

bradbell commented 3 months ago

I have done some more work on parallel mode examples and made some other changes to make it easier to use. See the headings 03-15 through 03-09 on https://cppad.readthedocs.io/latest/2024.html As you can see from the description, this found some bugs in the error detection during parallel mode.

@juanmanzanero If you are happy with this solution, please close this issue.

juanmanzanero commented 3 months ago

Thank you very much Brad. This solution works perfectly in my machine. Thanks for your time!