optimize n-queens: Performance of pre-compiled cbc vs source build

Describe the bug I find that build-from-source instructions result in a binary that is nearly twice as slow as the checked-in cbc-c-linux-x86-64.so.

My motivation in compiling form source is so that I can run on aarch64 host (Graviton3: aws c7g instance). The pre-compiled .so for cbc doesn't include aarch64 binaries. Testing showed I was getting poor performance, but when I take the same steps on x86 I find that none of the variations of flags or compilers I tried are able to match the pre-compiled binary, so this isn't so much a "slow on arm" problem as much as it is a "slow when I build from source" problem.

The test my customer pointed me to is the n-queens test. However they have modified it from your queens.py to include an optimize() call, which adds significant time. Does this make sense as a benchmark since solutions are either valid or not (but not "optimizable")?

To Reproduce I have tried two approaches:

Follow the instructions on https://python-mip.readthedocs.io/en/latest/install.html and use coinbrew from master. I've tried this with clang, various gcc versions (from v9.4.0 to v14.1.0), and various CXXFLAGS and CFLAGS (-march=native -mtune=native -O3 -flto)
Run the scripts in scripts/ namely buildCBCLinux.sh and downloadCBC.sh. This took some modification to the last g++ line to include -lnauty -lcholmod -lreadline -lbz2 and adjust some include and lib paths, but the runtime was the same as the coinbrew approach (and it crashed at the end...but that could have been a mismatch in .so and python-mip version -- I didn't investigate)

I find that after including optimize in queens.py: for n=200, it takes about 60 seconds when I build from source on c6i (Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz) and about 20 seconds when I use the provided .so.

I found also https://github.com/coin-or/python-mip/issues/215#issuecomment-955792305 however I don't know which versions or flags I can use to reproduce the binary. It seems it was last updated 3 years ago, is it possible Cbc itself had a regression in that time or that I need to use particular versions? I see references in Benchmarks about " we implemented an automatic buffering/flushing mechanism in the CBC C Interface". Is this included in Cbc master now?

Expected behavior

Compile-from-source binary should be as fast as cbc-c-linux-x86-64.so.
Instructions to reproduce cbc-c-linux-x86-64.so.

Desktop (please complete the following information):

Operating System, version: Ubuntu 20.04 and AL2023
Python version: 3.8.10
Python-MIP version (we recommend you to test with the latest version): 1.15.0 from pip.

We found the source of the difference in performance. The cause is the new default for CBC_PREPROCESS_EXPERIMENT set around the time of https://github.com/coin-or/CoinUtils/commit/293e6e981774ed047e8f00f7aff9252262f83a02 (Dec 2023). Resetting it back to 0 with -D in the CFLAGS improves performance of the n-queens call to optimize()

However I did check a few trials over a traveling salesman example, and those tests were unaffected by the CBC_PREPROCESS_EXPERIMENT macro.

Last: the Benchmarks page does not include the optimize() call but the Model Example n-Queens does. The second was originally used as the source of our benchmark. Is it appropriate to remove the call form the Model Example documentation?

coin-or / python-mip

optimize n-queens: Performance of pre-compiled cbc vs source build #393