bonelli commented 4 years ago

I'm working on a problem that is mathematically defined with a couple of big sparse binary matrices, I'm using quite a lot of python-mip binary variables and I collected them into numpy tensors to have some syntactic sugar and matrix operations over them.

It's very nice but very very slow, and it seems to me that with some basic optimization it could perform much better, notably to the LinExpr.__mul__() function.

Would it be reasonable to have the special cases for multiplication by 1 or 0 to speed up this kind of use?

h-g-s commented 4 years ago

Hi Frederico,

Yes, we are aware that LinExpr is probably the part of Python-MIP where optimizations are more needed now.

Let me explain why it is slow now, maybe you can think of something to speedup it (I've tried some approaches, not very successful).

if you enter an expression like 2 x1 + 5 x2 + 1 x1, as Python goes parsing it, several increasingly larger LinExpr are created. So, part of the cost is related to the creation of several objects. Also coefficients of the same variable need to be grouped (in our example to 3 x1 + 5 x2) before sending it to cbc. Thus, to "quickly" (in big O notation) update variables' coefficients we keep a dictionary mapping variables to coefficients.

You use CPython or Pypy ?

Could you provide us some profile from your code ?

Cheers

--

Haroldo Gambini Santos Computing Department Universidade Federal de Ouro Preto - UFOP email: haroldo@ufop.edu.br Haroldo.GambiniSantos@cs.kuleuven.be home/research page: www.decom.ufop.br/haroldo

It has long been an axiom of mine that the little things are infinitely the most important. -- Sir Arthur Conan Doyle, "A Case of Identity"

On Mon, 11 May 2020, Federico Bonelli wrote:

I'm working on a problem that is mathematically defined with a couple of big sparse binary matrices, I'm using quite a lot of python-mip binary variables and I collected them into numpy tensors to have some syntactic sugar and matrix operations over them.

It's very nice but very very slow, and it seems to me that with some basic optimization it could perform much better, notably to the LinExpr.mul() function.

Would it be reasonable to have the special cases for multiplication by 1 or 0 to speed up this kind of use?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, orunsubscribe.[AB4VZOSR6TUKXSD5HTKPYCDRRBYYFA5CNFSM4M6JIBP2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW 45C7NFSM4JF2XQ2A.gif]

bonelli commented 4 years ago

Hi @h-g-s, thanks for your answer.

I understand the situation, I'm using CPython and started profiling to better understand if I can be of any support.

Here is an extract of the profile results for the heaviest matrix operation, which is a numpy.matmul between a numpy array containing python-mip variables and another with dtype=int in 0 and 1.

ncalls	tottime	percall	cumtime	percall	filename:lineno(function)
3099193	36.14	1.166e-05	62.22	2.008e-05	entities.py:170(mul)
32640914	29.01	8.888e-07	40	1.225e-06	entities.py:310(add_var)
1	26.79	26.79	26.79	26.79	default.py:488(connect)
145835132	24.36	1.67e-07	24.36	1.67e-07	entities.py:528(hash)
2616313	20.25	7.739e-06	64.66	2.471e-05	entities.py:288(add_expr)
5638618	7.923	1.405e-06	14.23	2.523e-06	entities.py:320(copy)
1	4.976	4.976	144	144	foo.py:166(D)
2622175	4.845	1.848e-06	4.845	1.848e-06	~:0(<method 'items' of 'dict' objects>)
2539173	4.698	1.85e-06	76.69	3.02e-05	entities.py:106(add)
1	4.233	4.233	4.233	4.233	~:0()
5638653	3.536	6.271e-07	3.536	6.271e-07	~:0(<method 'copy' of 'dict' objects>)
8760991/8760990	3.278	3.741e-07	5.999	6.847e-07	~:0()
5783502	3.234	5.593e-07	3.559	6.154e-07	entities.py:85(init)
72/71	1.829	0.02576	1.831	0.02579	~:0()
3240098	1.461	4.51e-07	2.695	8.318e-07	abc.py:96(instancecheck)
3240098	1.232	3.802e-07	1.234	3.808e-07	~:0()

I cut the results at 1 second or more of tottime

Here attached is the plot of the same profiling run

profile_viz

tkralphs commented 4 years ago

If you already have all your input data as numpy matrices, you probably just want to use CyLP. Someone with the basically the same issue just reported reducing the time to get the instance into Cbc from 10 minutes with python-mip down to 10 seconds with CyLP (see https://github.com/coin-or/CyLP/issues/95).

bonelli commented 4 years ago

Thanks I'll see into CyLP, but I like Python-MIP so far so I'll give it a good try before changing.

I tried basic optimization against the master branch:

sum by integer 0 will return self without even a copy
multiply by integer 1 will return self without even a copy
multiply by integer 0 will return 0

this improves significantly my case of numpy.matmul with a binary integer matrix: here are some measurements

not optimized: 2020-05-13 00:55:03,827 INFO foo.bar - Starting D multiplication 2020-05-13 01:03:10,727 INFO foo.bar - Finishing D multiplication = 487 secs

optimized: 2020-05-13 00:49:26,222 INFO foo.bar - Starting D multiplication 2020-05-13 00:50:54,799 INFO foo.bar - Finishing D multiplication = 88 secs

for that same operation in this particular case I get a gain of about 5.5x time faster

Do you think this is worth a PR?

h-g-s commented 4 years ago

Hi @bonelli , your optimization seems good, please make a PR.

I'm thinking in some optimizations here that I can test, but I would need a code example where this bottleneck appears. Do you think you could write a simpler code that exposes the same performance bottleneck ? Then I could test my ideas and see the effect in an environment similar to yours.

@tkralphs , didn't know that the matrix interface could be so much faster, I'll check on how to provide a similar input in Python-MIP.

jurasofish commented 4 years ago

I was thinking of making the same optimisation, but wouldn't the following code result in issues? a would end up being multiplied by 3, correct? haven't tested.

a = m.add_var()
b = a * 1
b *= 3

bonelli commented 4 years ago

I was thinking of making the same optimisation, but wouldn't the following code result in issues? a would end up being multiplied by 3, correct? haven't tested.
a = m.add_var()
b = a * 1
b *= 3

If the *= operator performs in-place (I think so) it is correct, a would be multiplied by 3.

Frankly with this kind of mathematical applications I believe that having in-place operations is so misleading that they shouldn't be allowed.

Mathematical variables/constants are defined once and immutable after that, encouraging this kind of imperative programming in a math/optimization library is something I wouldn't do.

bonelli commented 4 years ago

I'm thinking in some optimizations here that I can test, but I would need a code example where this bottleneck appears. Do you think you could write a simpler code that exposes the same performance bottleneck ?

@h-g-s sure, I'll make a PR and post some examples of slow numpy operations with python-mip

jurasofish commented 4 years ago

https://github.com/coin-or/python-mip/blob/0eb7f7c0104447ed8edee7fa846210e2022ab913/mip/entities.py#L185

yes that's right, it's an in-place operation. Removing in-place ops would of course be a severe backwards compatibility issue (perhaps for version 2). I'd support it though.

h-g-s commented 4 years ago

Hi @bonelli , @tkralphs , I'll add a matrix interface, so that rows could be queried and inserted as numpy arrays, then I think that it will be as fast as t can (with a not so pretty code, however)

bonelli commented 4 years ago

Hi @bonelli , @tkralphs , I'll add a matrix interface, so that rows could be queried and inserted as numpy arrays, then I think that it will be as fast as t can (with a not so pretty code, however)

All due respect, I don't think it's useful to add a matrix notation if it's gonna be just syntactic sugar, I just arranged normal variables inside a numpy array and the matrix notation is already done, and as a plus I get all the matrix operations I need, already implemented in numpy.

If on the other hand you're thinking of implementing faster versions of the LinExpr operations, taking advantage of the matrix structure (as you did with xsum for iterables), then I'm all for it.

Sorry I couldn't make the PR today, didn't have time even if it's high on my priority list

h-g-s commented 4 years ago

a faster LinExpr is the priority but my idea is also to make a available a low level interface to communicate with the solver. With CFFI it is possible to allocate C arrays, which can be directly passed to CBC, for example, this would be a "last resort" for those who want to the maximum performance.

bonelli commented 4 years ago

Here is the PR for reference #100 If it will be released it solves this issue for my part of the problem.

bonelli commented 4 years ago

release 1.9.0 resolve this issue for me, I'd close it

coin-or / python-mip

Optimize LinExpr multiplication by one or zero #98

--