Foggalong / RobustOCS

Robust optimal contirbution selection problems for genetics with Python
https://pypi.org/project/robustocs
MIT License
1 stars 0 forks source link

SciPy arrays and typing tweaks #22

Closed Foggalong closed 1 month ago

Foggalong commented 1 month ago

Started addressing #21 and it seems to have made the module a fraction slower. Only really appreciable at $n = 1000$, but definitely above the variation we had been seeing. For the SQP with HiGHS it's only a 1% slowdown, but it's 3% for the non-robust methods and 7% for SQP with Gurobi.

Commit Gurobi (standard) HiGHS (standard) Gurobi (conic) Gurobi (SQP) HiGHS (SQP) Total
3433097 0.676 0.204 2.760 24.400 1.700 508.1
fe08695 0.697 0.210 2.820 26.200 1.720 529.2

Since I made a few other typing related changes at the same time, before pressing ahead I want to isolate which of the following are responsible:

  1. switching from scipy.sparse.spmatrix to scipy.sparse.sparray. If this is what's responsible not much we can do there since SciPy are (eventually) deprecating spmatrix, though if it makes a difference we could hold off switching until the deprecation is looming.
  2. genericising types from np.int8 and np.float64 to np.integer and np.floating. This was a good practice change since requiring the end user to work at 64 bit precision isn't ideal, but possible that now NumPy is spending some time deciding which C precision to convert the Python values to rather than just being told "this one". If that's the case, this could be exposed as an optional dtype argument which defaults to 64 bit.
  3. swapping dtype=float for dtype=np.floating. I'd've thought this was a speed up rather than a slowdown, because before doing an operation NumPy converts float to a C precision anyways. This change should just be pre-empting that, but maybe I just implemented it wrong.
  4. swapping dtype=np.int8 for dtype=np.bool for M in $M\mathbf{w} = \mathbf{m}$. The idea here was to save storage, since it's a binary matrix using bool type means only storing one byte per entry. However it's possible the conversion NumPy goes through when it's used in an operation is adding some overhead.

TL;DR: made some changes and it got slower, but have an idea who the culprits are. Will try and minimize the impact before merging.

Foggalong commented 1 month ago

Confirmed this morning that the slowdown is coming from sparray over spmatrix

Changes Gurobi (standard) HiGHS (standard) Gurobi (conic) Gurobi (SQP) HiGHS (SQP)
Baseline 0.691 0.203 2.740 25.300 1.720
Use C types 0.678 0.204 2.700 25.300 1.720
np.bool for $M$ 0.686 0.205 2.740 25.000 1.720
Generic Typing 0.686 0.204 2.750 25.200 1.720
sparray 0.699 0.211 2.770 27.000 1.740
All Changes 0.697 0.210 2.820 26.200 1.720

Will have a look around this afternoon what the support horizon is on spmatrix and whether it's worth holding off until it's closer to deprecation / matures as a class.

Foggalong commented 1 month ago

While I've not managed to find a way to speed up sparray operations, I have found out some neat stuff about SciPy's loading methods. Managed to use that to get what's possibly a little speed-up for HiGHS, but more importantly it simplifies the loading code.

Final timings

Changes Gurobi (standard) HiGHS (standard) Gurobi (conic) Gurobi (SQP) HiGHS (SQP)
Baseline 0.691 0.203 2.740 25.300 1.720
9273dda 0.694 0.187 2.730 25.300 1.710