ghost commented 8 years ago

The bottleneck is with the init of the y array...

ghost commented 8 years ago

Using np.zeros() where possible and otherwise Cython's memset.

ghost commented 8 years ago

Backtracked: use of np.zeros() would have been redundant with some existing code and what you (@dpo) want to achieve (i.e. A.matvec(x, y) ). We did (slightly) better than SciPy for CSC with that implementation but it is not worth it.

I'm using memset though as it is quicker than a loop (of course).

ghost commented 8 years ago

For the record:

Benchmark Report

matvec with 1000 elements and size = 10,000

name	rank	runs	mean	sd	timesBaseline
cysparse2	1	100	2.356e-05	1.954e-06	1.0
cysparse	2	100	2.42e-05	3.323e-06	1.02732240437
scipy sparse2	3	100	2.522e-05	4.87e-06	1.07053228091
scipy sparse	4	100	2.626e-05	2.51e-06	1.11455171018

matvec with 10,000 elements and size = 100,000

name	rank	runs	mean	sd	timesBaseline
cysparse2	1	100	0.0001921	8.291e-06	1.0
cysparse	2	100	0.0001968	4.033e-05	1.02422707922
scipy sparse	3	100	0.0001982	2.345e-05	1.03203385834
scipy sparse2	4	100	0.0001997	3.779e-05	1.03960482059

matvec with 100,000 elements and size = 1,000,000

name	rank	runs	mean	sd	timesBaseline
scipy sparse2	1	100	0.00448	8.045e-05	1.0
scipy sparse	2	100	0.004498	0.0001532	1.00396344773
cysparse2	3	100	0.004884	0.0002421	1.09010723872
cysparse	4	100	0.004888	0.0002864	1.09090599477

matvec with 5000 elements and size = 1,000,000

name	rank	runs	mean	sd	timesBaseline
cysparse2	1	100	0.001559	6.356e-05	1.0
scipy sparse2	2	100	0.001559	7.308e-05	1.00012998989
cysparse	3	100	0.001583	0.0001955	1.01551926374
scipy sparse	4	100	0.001592	0.0002038	1.02095589978

Each of the above 1600 runs were run in random, non-consecutive order by benchmark v0.1.5 (http://jspi.es/benchmark) with Python 2.7.5+ Linux-3.11.0-12-generic-x86_64 on 2016-03-04 11:26:58.

ghost commented 8 years ago

Actually, we weren't even better than SciPy! ;-)

ghost commented 8 years ago

Current code:

Benchmark Report

matvec with 1000 elements and size = 10,000

name	rank	runs	mean	sd	timesBaseline
scipy sparse2	1	100	2.508e-05	4.849e-06	1.0
cysparse2	2	100	2.543e-05	4.174e-06	1.01397604107
cysparse	3	100	2.579e-05	2.631e-06	1.02842745769
scipy sparse	4	100	2.62e-05	3.498e-06	1.04459022628

matvec with 10,000 elements and size = 100,000

name	rank	runs	mean	sd	timesBaseline
scipy sparse	1	100	0.0001946	9.274e-06	1.0
scipy sparse2	2	100	0.0001959	1.385e-05	1.00668855351
cysparse	3	100	0.0002143	9.418e-06	1.10098980792
cysparse2	4	100	0.000224	4.933e-05	1.15109270874

matvec with 100,000 elements and size = 1,000,000

name	rank	runs	mean	sd	timesBaseline
scipy sparse2	1	100	0.004495	9.861e-05	1.0
scipy sparse	2	100	0.004515	0.0002076	1.00458403417
cysparse	3	100	0.004884	0.0001992	1.08651210801
cysparse2	4	100	0.004899	0.0002066	1.08996419022

matvec with 5000 elements and size = 1,000,000

name	rank	runs	mean	sd	timesBaseline
cysparse2	1	100	0.001577	4.566e-05	1.0
cysparse	2	100	0.001583	4.705e-05	1.00427014407
scipy sparse2	3	100	0.001593	0.0001809	1.01053927202
scipy sparse	4	100	0.001607	0.0001678	1.01915667679

Each of the above 1600 runs were run in random, non-consecutive order by benchmark v0.1.5 (http://jspi.es/benchmark) with Python 2.7.5+ Linux-3.11.0-12-generic-x86_64 on 2016-03-04 12:23:24.

ghost commented 8 years ago

Just got strange segfault... Backtracking completely to previous code with loops. Not very "efficient" but stable.

Bottom line: the bottleneck is the init of the y array. I don't think it is worth to change that.

PythonOptimizers / cysparse

Investigate why SciPy is faster for CSC.matvec #239

Benchmark Report

matvec with 1000 elements and size = 10,000

matvec with 10,000 elements and size = 100,000

matvec with 100,000 elements and size = 1,000,000

matvec with 5000 elements and size = 1,000,000

Benchmark Report

matvec with 1000 elements and size = 10,000

matvec with 10,000 elements and size = 100,000

matvec with 100,000 elements and size = 1,000,000

matvec with 5000 elements and size = 1,000,000