Gecode / gecode

Generic Constraint Development Environment
https://www.gecode.org
Other
275 stars 76 forks source link

Memory leak of solutions at Engine in a multi-threading context #156

Open alex-87 opened 1 year ago

alex-87 commented 1 year ago

Description

There is a memory leak at Engine in a multi-threading context when the following conditions are met:

To Reproduce

To reproduce the issue, here is the following code (based on the SendMoreMoney example code). The code is modified in order to have more than one solution :

#include <gecode/int.hh>
#include <gecode/search.hh>

using namespace Gecode;

class SendMoreMoney : public Space {
protected:
  IntVarArray l;
public:
  SendMoreMoney(void) : l(*this, 8, 0, 9) {
    IntVar s(l[0]), e(l[1]), n(l[2]), d(l[3]),
           m(l[4]), o(l[5]), r(l[6]), y(l[7]);
    // no leading zeros
//    rel(*this, s, IRT_NQ, 0);
//    rel(*this, m, IRT_NQ, 0);
    // all letters distinct
//    distinct(*this, l);
    // linear equation
    IntArgs c(4+4+5); IntVarArgs x(4+4+5);
    c[0]=1000; c[1]=100; c[2]=10; c[3]=1;
    x[0]=s;    x[1]=e;   x[2]=n;  x[3]=d;
    c[4]=1000; c[5]=100; c[6]=10; c[7]=1;
    x[4]=m;    x[5]=o;   x[6]=r;  x[7]=e;
    c[8]=-10000; c[9]=-1000; c[10]=-100; c[11]=-10; c[12]=-1;
    x[8]=m;      x[9]=o;     x[10]=n;    x[11]=e;   x[12]=y;
    linear(*this, c, x, IRT_EQ, 0);
    // post branching
    branch(*this, l, INT_VAR_SIZE_MAX(), INT_VAL_MAX());
  }
  // search support
  SendMoreMoney(SendMoreMoney& s) : Space(s) {
    l.update(*this, s.l);
  }
  virtual Space* copy(void) {
    //std::cout << "Clone occurred" << std::endl;
    return new SendMoreMoney(*this);
  }
  // print solution
  void print(void) const {
    std::cout << l << std::endl;
  }

  ~SendMoreMoney() {
    //std::cout << "Destructor Ok" << std::endl;
  }
};

// main function
int main(int argc, char* argv[]) {

  Gecode::Search::Options opt;
  opt.threads = 8;
  opt.c_d     = 1;

  // create model and search engine
  SendMoreMoney* m = new SendMoreMoney;
  DFS<SendMoreMoney> e(m, opt);
  delete m;
  // search and print all solutions
  if (SendMoreMoney* s = e.next()) {
    s->print(); delete s;
  }
  return 0;
}

Compilation with Debug mode:

g++ -g -std=c++11 -o smm ./send-more-money.cpp -lgecodesearch -lgecodeint -lgecodekernel -lgecodesupport

Platform

Valgrind output

...
==4005193== 2,714,208 (350,816 direct, 2,363,392 indirect) bytes in 1,154 blocks are definitely lost in loss record 11 of 11
==4005193==    at 0x4E05045: malloc (vg_replace_malloc.c:381)
==4005193==    by 0x155216: alloc (allocator.hpp:80)
==4005193==    by 0x155216: ralloc (heap.hpp:358)
==4005193==    by 0x155216: operator new (heap.hpp:416)
==4005193==    by 0x155216: SendMoreMoney::copy() (send-more-money.cpp:37)
==4005193==    by 0x274063: Gecode::Space::_clone() (core.cpp:757)
==4005193==    by 0x15E83F: clone (core.hpp:3228)
==4005193==    by 0x15E83F: Gecode::Search::Par::DFS<Gecode::Search::NoTraceRecorder>::Worker::run() (dfs.hpp:249)
==4005193==    by 0x2792F4: Gecode::Support::Thread::Run::exec() (thread.cpp:60)
==4005193==    by 0x2793EE: Gecode::Support::bootstrap(void*) (pthreads.cpp:43)
==4005193==    by 0x52D4B42: start_thread (pthread_create.c:442)
==4005193==    by 0x5365BB3: clone (clone.S:100)
==4005193== 
==4005193== LEAK SUMMARY:
==4005193==    definitely lost: 350,816 bytes in 1,154 blocks
==4005193==    indirectly lost: 2,363,392 bytes in 1,154 blocks
==4005193==      possibly lost: 137,544 bytes in 11 blocks
==4005193==    still reachable: 1,256 bytes in 9 blocks
==4005193==         suppressed: 0 bytes in 0 blocks
==4005193== Reachable blocks (those to which a pointer was found) are not shown.

Analysis

After a short analysis, I found out that:

  1. In gecode/search/par/engine.hh, the Support::DynamicQueue<Space*,Heap> solutions attribute contains pointers to the solutions that are pushed by workers during the search phase.
  2. The solutions is consumed through the Engine<Tracer>::next(void) method.
  3. If the solutions aren't consumed, the Space* objects remain in the solutions queue. The Space* objects are generated by Space* s = cur->clone(); and pushed to the solutions queue when a solution is found.
  4. At the end of processing, the destructor of the Support::DynamicQueue<Space*, Heap> solutions object frees its memory through a.free(q,limit)
  5. However, unconsumed Space* objects are definitely lost as their references are removed by the previous operation.

Draft solution To explicitly consume remaining Space* through solution.pop() then delete those objects when the destructor of Engine<Tracer> is called.

zayenz commented 1 year ago

Thanks for this great error report, reproduction, and analysis!

On a cursory look, I would agree with your analysis fully. Will check more fully later.