coin-or / Cbc

COIN-OR Branch-and-Cut solver
Other
747 stars 107 forks source link

CbcMain not thread safe #332

Open spoorendonk opened 3 years ago

spoorendonk commented 3 years ago

I am trying to run some MIP problems in parallel using the CbcMain0/CbcMain1 pattern. I am on https://github.com/coin-or/Cbc/tree/stable/2.10 Problem is both on Linux and Windows.

My code snippet is

    std::vector<int> probIndices(10);
    std::iota(probIndices.begin(), probIndices.end(), 0);
    std::for_each(std::execution::par,
                  std::begin(probIndices),
                  std::end(probIndices),
                  [&](int idx) {
                    OsiClpSolverInterface clp;
                      clp.readLp("folio10_7.lp");
                      clp.initialSolve();
                      clp.resolve();
                      CbcModel cbcModel(clp);
                      CbcMain0(cbcModel);

                      const char *argv[20];
                      int argc = 0;
                      std::string cbcExe = "cbc";
                      std::string cbcSolve = "-solve";
                      std::string cbcQuit = "-quit";
                      std::string cbcLog = "-log";
                      std::string cbcLogSet = "3";
                      std::string cbcGap = "-ratio";
                      std::string cbcGapSet = "0.05";
                      std::string cbcTime = "-seconds";
                      std::string cbcTimeSet = "1000";
                      std::string cbcCutoff = "-cutoff";
                      std::string cbcCutoffSet = "1000000";
                      argv[argc++] = cbcExe.c_str();
                      argv[argc++] = cbcLog.c_str();
                      argv[argc++] = cbcLogSet.c_str();
                      argv[argc++] = cbcGap.c_str();
                      argv[argc++] = cbcGapSet.c_str();
                      argv[argc++] = cbcTime.c_str();
                      argv[argc++] = cbcTimeSet.c_str();
                      argv[argc++] = cbcCutoff.c_str();
                      argv[argc++] = cbcCutoffSet.c_str();
                      argv[argc++] = cbcSolve.c_str();
                      argv[argc++] = cbcQuit.c_str();
                      CbcMain1(argc, argv, cbcModel);
                      if (cbcModel.status() == -1)
                      {
                          throw std::domain_error("CBC fail");
                      }
                  });
    return 0;

It fails when parsing the commandline parameters (I think) The output is like

...
Welcome to the CBC MILP Solver 
Version: 2.10 
Build Date: Sep  1 2020 

command line - cbc -log 3 -ratio 0.05 -seconds 1000 -cutoff 1000000 -solve -quit (default strategy 1)
logLevel was changed from 1 to 3
ratioGap was changed from 0 to 0.05
String of -log is illegal for double parameter seconds value remains -1
No match for 3 - ? for list of commands
Welcome to the CBC MILP Solver 
Version: 2.10 
Build Date: Sep  1 2020 

command line - cbc -log 3 -ratio 0.05 -seconds 1000 -cutoff 1000000 -solve -quit (default strategy 1)
ratioGap was changed from 0.05 to 0.05
String of -cutoff is illegal for double parameter seconds value remains No match for 1000 - ? for list of commands-1

No match for 1000000 - ? for list of commands

Running single threaded is no problem, that is, std::execution::seq.

I was not able to readily track this down in the CbcMain1 function, it is a rather complicated function. Anyone experienced this before?

tkralphs commented 3 years ago

It's not clear to me, but did you build with --enable-cbc-parallel? That's probably necessary to get a (probably) thread-safe version of the code. Otherwise, there are some global variables and things that mess with thread safety. I'm in the process of trying to eliminate all the global variables now to deal with this issue in the current development version.

spoorendonk commented 3 years ago

Yes --enable-cbc-parallel is on.

In the DIp code (stable/0.95) I found

https://github.com/coin-or/Dip/blob/b8f05414090729e28187126f3be96564ca3a7f60/Dip/src/DecompModel.cpp#L398-L409

and underneath it uses CbcMain for single threaded. I guess the same problem was encountered here. It works with the cbc.branchAndBound() in parallel. But that is not the recommended (most efficient) way as far as I understand?

tkralphs commented 3 years ago

Hmm, I don't think we did encounter issues the last time we experimented with it, but it's been a while. Since I'm digging around in the code right now anyway, I'll see if I can see any fix for this. Yes, the results will likely not be as good when calling cbc.branchAndBound() directly.

spoorendonk commented 3 years ago

Cool. That would be super helpful

jjhforrest commented 3 years ago

I think you just have to add -DCBC_THREAD_SAFE to your configuration. Without that I could get errors similar to the ones you describe.

I could not get std::execution::par to run but going back to pthreads the following worked with sample parameters of -log 1 -ratio 0.001 -seconds 1000 -cutoff 1.0e8 -solve

My code was

int main(int argc, const char *argv[]) {

define NUM_PROB 10

OsiClpSolverInterface solver[NUM_PROB]; CbcModel model[NUM_PROB]; pthread_t threadId[NUM_PROB]={0};; const char mpsFiles[NUM_PROB] = {"10teams", "enigma", "misc07", "stein45", "air03","bell3a","bell5","dcmulti", "gesa2","p2756"}; for (int iModel = 0;iModel < NUM_PROB;iModel++) { char name[50]; strcpy(name,"~/miplib3/"); strcat(name,mpsFiles[iModel]); int numMpsReadErrors = solver[iModel].readMps(name, ""); if (numMpsReadErrors != 0) { printf("%d errors reading MPS file\n", numMpsReadErrors); return numMpsReadErrors; } } for (int iModel = 0;iModel < NUM_PROB;iModel++) { pthread_create(&threadId[iModel], NULL, doLpThread,&solver[iModel]); } // wait for (int iModel = 0;iModel < NUM_PROB;iModel++) pthread_join(threadId[iModel],NULL); // do MIP mipBundle bundle[NUM_PROB]; for (int iModel = 0;iModel < NUM_PROB;iModel++) { // back to old printing solver[iModel].getModelPtr()->setMinIntervalProgressUpdate(-1000.0); model[iModel]= new CbcModel(solver[iModel]); bundle[iModel].model = model[iModel]; bundle[iModel].argc = argc; bundle[iModel].argv = argv; } for (int iModel = 0;iModel < NUM_PROB;iModel++) { pthread_create(&threadId[iModel], NULL, doMipThread,&bundle[iModel]); } // wait for (int iModel = 0;iModel < NUM_PROB;iModel++) { pthread_join(threadId[iModel],NULL); printf("Model %d finished\n",iModel); } for (int iModel = 0;iModel < NUM_PROB;iModel++) printf("Model %d took %d nodes with objective %g\n", iModel,model[iModel]->getNodeCount(), model[iModel]->getObjValue()); return 0; }

John Forrest

On 07/09/2020 15:14, Simon Spoorendonk wrote:

I am trying to run some MIP problems in parallel using the |CbcMain0/CbcMain1| pattern. I am on https://github.com/coin-or/Cbc/tree/stable/2.10 Problem is both on Linux and Windows.

My code snippet is

 std::vector<int> probIndices(10);
 std::iota(probIndices.begin(), probIndices.end(), 0);
 std::for_each(std::execution::par,
               std::begin(probIndices),
               std::end(probIndices),
               [&](int  idx) {
                 OsiClpSolverInterface clp;
                   clp.readLp("folio10_7.lp");
                   clp.initialSolve();
                   clp.resolve();
                   CbcModelcbcModel(clp);
                   CbcMain0(cbcModel);

                   const  char  *argv[20];
                   int  argc =0;
                   std::string cbcExe ="cbc";
                   std::string cbcSolve ="-solve";
                   std::string cbcQuit ="-quit";
                   std::string cbcLog ="-log";
                   std::string cbcLogSet ="3";
                   std::string cbcGap ="-ratio";
                   std::string cbcGapSet ="0.05";
                   std::string cbcTime ="-seconds";
                   std::string cbcTimeSet ="1000";
                   std::string cbcCutoff ="-cutoff";
                   std::string cbcCutoffSet ="1000000";
                   argv[argc++] = cbcExe.c_str();
                   argv[argc++] = cbcLog.c_str();
                   argv[argc++] = cbcLogSet.c_str();
                   argv[argc++] = cbcGap.c_str();
                   argv[argc++] = cbcGapSet.c_str();
                   argv[argc++] = cbcTime.c_str();
                   argv[argc++] = cbcTimeSet.c_str();
                   argv[argc++] = cbcCutoff.c_str();
                   argv[argc++] = cbcCutoffSet.c_str();
                   argv[argc++] = cbcSolve.c_str();
                   argv[argc++] = cbcQuit.c_str();
                   CbcMain1(argc, argv, cbcModel);
                   if  (cbcModel.status() == -1)
                   {
                       throw  std::domain_error("CBC fail");
                   }
               });
 return  0;

It fails when parsing the commandline parameters (I think) The output is like

... Welcome to the CBC MILP Solver Version: 2.10 Build Date: Sep 1 2020

command line - cbc -log 3 -ratio 0.05 -seconds 1000 -cutoff 1000000 -solve -quit (default strategy 1) logLevel was changed from 1 to 3 ratioGap was changed from 0 to 0.05 String of -log is illegalfor double parameter seconds value remains -1 No matchfor 3 -? for list of commands Welcome to the CBC MILP Solver Version: 2.10 Build Date: Sep 1 2020

command line - cbc -log 3 -ratio 0.05 -seconds 1000 -cutoff 1000000 -solve -quit (default strategy 1) ratioGap was changed from 0.05 to 0.05 String of -cutoff is illegalfor double parameter seconds value remains No matchfor 1000 -? for list of commands-1

No matchfor 1000000 -? for list of commands

Running single threaded is no problem, that is, |std::execution::seq|.

I was not able to readily track this down in the |CbcMain1| function, it is a rather complicated function. Anyone experienced this before?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/coin-or/Cbc/issues/332, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWJYHCFARNBUZM6LB6TAD3SETTDDANCNFSM4Q6JEUSQ.

tkralphs commented 3 years ago

@jjhforrest Ah, yes, any reason not to change CBC_THREAD_SAFE to CBC_THREAD so that we get thread safe code by default?

spoorendonk commented 3 years ago

Excellent. It seems to solve my problem.

When I build static libs are there other defines I should remember to include when linking coin into my executable? Is it documented somewhere?

svigerske commented 3 years ago

@jjhforrest Ah, yes, any reason not to change CBC_THREAD_SAFE to CBC_THREAD so that we get thread safe code by default?

Even if one disabled multithreading for Cbc you want Cbc to be thread-safe - always and everytime. That is, I don't understand why there is an ifdef at all.

jjhforrest commented 3 years ago

Stefan,

I have no idea when that #ifdef appeared. I presume that whoever wrote the code (which might be me years ago) was not sure if the code included by the ifdef was correct and did not want to create any bugs.

John On 08/09/2020 16:11, Stefan Vigerske wrote:

@jjhforrest <https://github.com/jjhforrest> Ah, yes, any reason not
to change |CBC_THREAD_SAFE| to |CBC_THREAD| so that we get thread
safe code by default?

Even if one disabled multithreading for Cbc you want Cbc to be thread-safe - always and everytime. That is, I don't understand why there is an |ifdef| at all.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/coin-or/Cbc/issues/332#issuecomment-688943958, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWJYHEKXU7DI4MKE5L4ZALSEZCS7ANCNFSM4Q6JEUSQ.

tkralphs commented 3 years ago

The stuff inside CBC_THREAD_SAFE is a quick hack. I'm trying to clean this up so that we'll always have thread safety, as @svigerske said.

spoorendonk commented 3 years ago

When I build static libs are there other defines I should remember to include when linking coin into my executable? Is it documented somewhere?

I am adding

            "CBC_THREAD",
            "COIN_HAS_COINUTILS",
            "COIN_HAS_OSI",
            "COIN_HAS_CLP",
            "COIN_HAS_CGL",
            "COIN_HAS_CBC",

as defines for now when linking against Cbc. Is there anything else I should add?

tkralphs commented 3 years ago

If you're using the autotools, it shouldn't be necessary to define any of those symbols. The COIN_HAS_XXX variables should be defined automatically in CbcConfig.h (or a header included by it). CBC_THREAD is also auto-defined if you are building with --enable-cbc-parallel. There are some symbols you can define to turn on experimental stuff or that enable fixes for some edge case, but generally speaking, the symbol definition is taken care of by configure when necessary.

spoorendonk commented 3 years ago

My CbcConfig.h only has

/* src/config_cbc.h.  Generated by configure.  */
/* src/config_cbc.h.in. */

#ifndef CBC_VERSION

/* Version number of project */
#define CBC_VERSION "2.10"

/* Major Version number of project */
#define CBC_VERSION_MAJOR 2

/* Minor Version number of project */
#define CBC_VERSION_MINOR 10

/* Release Version number of project */
#define CBC_VERSION_RELEASE 9999

#endif

When I search the headers of dist/include I see nothing like #define COIN_HAS_CLP 1 which is added to the build/Cbc/2.10/src/config.h by the configure script (and not copied into CbcConfig.h?).

There is a lot of "#ifdef COIN_HAS_CLP" in other headers in dist/include. Maybe my c++ build skills are not super tight, but I am wondering if I link with the libs in dist/lib and include headers dist/include could I run into a problem if I do not explicitly do -DCOIN_HAS_CLP=1?

I am not using autotools but cmake when linking coin to my own code.

svigerske commented 3 years ago

It's a sign for unclean code if the Cbc API header have ifdef's like these. If that is in parts of headers that are relevant for a user of Cbc, then that can be a bug, too. The user is certainly not expected to guess a number of COIN_HAS_XYZ to be defined manually. This won't be fixed in 2.10 anymore, though, and @tkralphs is doing some cleanup based on master that may also reduce the appearance of such ifdef's in installed header files (until someone puts them back in again).

spoorendonk commented 3 years ago

Excellent. It seems to solve my problem.

So the -DCBC_THREAD_SAFE it worked in debug mode not in release (!?). Can't figure out what the difference is. However I look at

https://github.com/coin-or/Cbc/blob/b127c88b9f53e36197d1542f2e65fec3ebb2cb04/Cbc/src/CbcSolver.cpp#L1351-L1354

and then the usage in e.g.

https://github.com/coin-or/Cbc/blob/b127c88b9f53e36197d1542f2e65fec3ebb2cb04/Cbc/src/CbcSolver.cpp#L1193-L1204

This is not thread safe is it? Anyone can increment whichArgument?

spoorendonk commented 3 years ago

Embarrassing.

Of course it was my compile definitions that got screwed up. -DCBC_THRED_SAFE works ok.

dlaehnemann commented 3 years ago

I can confirm, that adding -DCBC_THREAD_SAFE to a coinbrew install avoids race conditions between separate instances (with separate problems) of calling the C API that run at the same time. For anybody looking for the exact invocation, here's what worked for me:

coinbrew build Cbc@master ADD_CXXFLAGS=-DCBC_THREAD_SAFE --no-prompt

And I guess as the cleanup in the refactor branch removes the #ifdef CBC_THREAD_SAFE, this will also take care of the thread safety in Cbc?

tkralphs commented 3 years ago

The cleanup in the refactor branch should take care of thread safety, yes. There will be no need to build with CBC_THREAD_SAFE anymore.