PyEllips / pyElli

An open source ellipsometry analysis tool for reproducible and comprehensible building of optical models.
https://pyelli.readthedocs.io
GNU General Public License v3.0
17 stars 6 forks source link

Expm is getting slow in `scipy==1.13.1` #178

Open domna opened 2 months ago

domna commented 2 months ago

With ~numpy==2.0.0~scipy==1.13.1 the performance of the solver drops tremendously. We should investigate where this is coming from and see if we can fix this by replacing to the proper (quicker) functions.

MarJMue commented 2 months ago

I will take a look, I should have a profiling notebook somewhere.

MarJMue commented 2 months ago

Interestingly if i run the benchmark locally (Fedora, Python 3.10 and 3.12) I don't get any performance regression. Maybe it's a Problem with the runner.

------------------------------------------------------------
Name (time in ms)          OPS(2.0.0)       OPS(1.26.4)     
------------------------------------------------------------
test_solver2x2           513.6902 (1.0)   422.0736 (1.0)    
test_solver4x4_linear    379.5551 (0.74)  292.9602 (0.69)   
test_solver4x4_eig        65.4104 (0.13)   49.6542 (0.12)   
test_solver4x4_expm        3.5880 (0.01)    3.5572 (0.01)   
------------------------------------------------------------
domna commented 2 months ago

I see issues locally, too

numpy 2

------------------------------------------------------------------------------------------ benchmark: 6 tests -----------------------------------------------------------------------------------------
Name (time in ms)                    Min                 Max                Mean              StdDev              Median                IQR            Outliers       OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_solver2x2                    3.3913 (1.0)       11.2985 (1.30)       4.6075 (1.0)        2.4295 (3.04)       3.6041 (1.0)       1.4018 (4.11)          1;1  217.0365 (1.0)          10           1
test_solver4x4_linear             3.7497 (1.11)       8.9823 (1.04)       4.7736 (1.04)       1.6175 (2.02)       4.0946 (1.14)      0.9789 (2.87)          1;1  209.4871 (0.97)         10           1
test_formula_solver2x2            5.9750 (1.76)       8.6650 (1.0)        6.4390 (1.40)       0.8001 (1.0)        6.2300 (1.73)      0.3410 (1.0)           1;1  155.3033 (0.72)         10           1
test_solver4x4_eig               42.4544 (12.52)     67.9676 (7.84)      47.8688 (10.39)      8.3157 (10.39)     44.0447 (12.22)     6.4657 (18.96)         2;1   20.8905 (0.10)         10           1
test_solver4x4_expm             449.6212 (132.58)   540.5948 (62.39)    471.5749 (102.35)    28.4518 (35.56)    462.2354 (128.25)   20.4840 (60.07)         1;1    2.1206 (0.01)         10           1
test_formula_solver4x4_expm     457.4233 (134.88)   800.8481 (92.42)    506.0184 (109.82)   104.2926 (130.35)   477.2739 (132.43)   24.4018 (71.56)         1;1    1.9762 (0.01)         10           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

numpy 1.x

---------------------------------------------------------------------------------------- benchmark: 6 tests ----------------------------------------------------------------------------------------
Name (time in ms)                    Min                 Max                Mean            StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_solver2x2                    3.7060 (1.0)        5.8486 (1.0)        4.0890 (1.0)      0.6535 (1.06)       3.8295 (1.0)      0.4599 (1.05)          1;1  244.5583 (1.0)          10           1
test_solver4x4_linear             4.3303 (1.17)       5.9519 (1.02)       4.9078 (1.20)     0.6175 (1.0)        4.6249 (1.21)     1.2390 (2.83)          3;0  203.7588 (0.83)         10           1
test_formula_solver2x2            5.9571 (1.61)       8.4977 (1.45)       6.4565 (1.58)     0.7682 (1.24)       6.1007 (1.59)     0.4382 (1.0)           1;1  154.8831 (0.63)         10           1
test_solver4x4_eig              114.9540 (31.02)    132.4588 (22.65)    118.5942 (29.00)    6.1394 (9.94)     115.7738 (30.23)    1.1952 (2.73)          2;2    8.4321 (0.03)         10           1
test_solver4x4_expm             393.4410 (106.16)   413.0689 (70.63)    403.1043 (98.58)    5.6653 (9.17)     402.3795 (105.07)   8.5332 (19.47)         2;0    2.4807 (0.01)         10           1
test_formula_solver4x4_expm     397.5010 (107.26)   410.9472 (70.26)    402.0846 (98.33)    4.2841 (6.94)     401.0340 (104.72)   3.6075 (8.23)          3;1    2.4870 (0.01)         10           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

But mainly for the eigen value solver

MarJMue commented 2 months ago

The eigenvalue solver is getting faster! In the CI benchmark, the Expm broke down by a factor of 4, which we both can not reproduce.

Edit: The culprit is SciPy. The regression was introduced in between 1.12 and 1.13.

domna commented 2 months ago

Oopsie, true. Sorry, just quickly checked on the side today and I just saw that something changed. I can confirm that scipy==0.12 is faster by a factor of 4 for me, too.

Here are the stats with numpy-1.26.4 and scipy-1.12.0:

----------------------------------------------------------------------------------------- benchmark: 6 tests -----------------------------------------------------------------------------------------
Name (time in ms)                    Min                 Max                Mean             StdDev              Median                IQR            Outliers       OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_solver2x2                    3.6657 (1.0)        4.1063 (1.0)        3.7916 (1.0)       0.1357 (1.0)        3.7635 (1.0)       0.1749 (1.18)          1;0  263.7438 (1.0)          10           1
test_solver4x4_linear             4.4456 (1.21)       5.0880 (1.24)       4.6169 (1.22)      0.1938 (1.43)       4.5882 (1.22)      0.1486 (1.0)           1;1  216.5961 (0.82)         10           1
test_formula_solver2x2            5.8515 (1.60)       9.1948 (2.24)       6.4353 (1.70)      1.0004 (7.37)       6.0991 (1.62)      0.4927 (3.32)          1;1  155.3918 (0.59)         10           1
test_solver4x4_expm              81.0214 (22.10)     99.6861 (24.28)     87.8797 (23.18)     6.2004 (45.69)     84.9936 (22.58)     8.0749 (54.35)         3;0   11.3792 (0.04)         10           1
test_formula_solver4x4_expm      82.5994 (22.53)    197.2490 (48.04)    105.7262 (27.88)    34.1210 (251.45)    94.7500 (25.18)    28.3755 (190.97)        1;1    9.4584 (0.04)         10           1
test_solver4x4_eig              115.4885 (31.51)    126.0902 (30.71)    117.8930 (31.09)     3.9314 (28.97)    116.0587 (30.84)     0.9391 (6.32)          2;2    8.4823 (0.03)         10           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I did not really find anything on a major change of expm in scipy 1.13 but there are some bugfixes on it in the release notes. Maybe we should build a minimal example just for expm and open an issue at scipy with it.

domna commented 2 months ago

Indeed when I run this

import time

import numpy as np
from scipy.linalg import expm

mat = np.array(
    [
        [0.0 - 0.0j, 0.0 - 0.0j, 0.0 - 0.0j, 0.0 - 2.5774387j],
        [0.0 - 0.0j, 0.0 - 0.0j, 0.0 + 4.3402231j, 0.0 - 0.0j],
        [0.0 - 0.0j, 0.0 + 5.60367362j, 0.0 - 0.0j, 0.0 - 0.0j],
        [0.0 - 9.43618706j, 0.0 - 0.0j, 0.0 - 0.0j, 0.0 - 0.0j],
    ]
)

start_time = time.time()
for _ in range(10000):
    expm(mat)
print(f"{(time.time() - start_time) * 1000} ms")

I get ~360 ms for scipy < 1.13 and 1800 ms for scipy >=1.13.

domna commented 2 months ago

hmm... I had a look at the changes of expm between v1.13.1 and v1.12.0 and the only thing that changes was that there was an explicit formula for 2x2 matrices which has been removed. So there must be a change in some other function expm calls which slows it down.

MarJMue commented 2 months ago

Yeah, it's not directly obvious what they changed. We can try to report the regression.

On the other hand, I did try out PyTorch again. They have improved performance massively. For me, it is comparable to the eigenvalue solver. Maybe we should make it an alternative again?

-------------------------------------------------
Name (time in ms)                     OPS           
-------------------------------------------------
test_solver2x2                   523.0784 (1.0)     
test_solver4x4_linear            476.9422 (0.91)    
test_formula_solver2x2           279.7011 (0.53)    
test_solver4x4_torch_expm         63.3393 (0.12)    
test_solver4x4_eig                67.1695 (0.13)    
test_solver4x4_expm                4.3523 (0.01)    
test_formula_solver4x4_expm        5.7128 (0.01)    
-------------------------------------------------
domna commented 2 months ago

Yeah, it's not directly obvious what they changed.

On the other hand, I did try out PyTorch again. They have improved performance massively. For me, it is comparable to the eigenvalue solver. Maybe we should make it an alternative again?

Yes, sounds good to have an alternative. However, I also tried to use torch and ran the above example and got even worse performance (~2500 ms I think it was). Can you share your code and libs? Then I'll try to reproduce it.

I would also open an issue at scipy to start a discussion there at least. Maybe someone there might just know what's going wrong.

MarJMue commented 2 months ago

I have prepared a branch with Pytorch at #181

domna commented 2 months ago

https://github.com/scipy/scipy/issues/21078