Closed ghost closed 8 years ago
Using np.zeros()
where possible and otherwise Cython's memset
.
Backtracked: use of np.zeros()
would have been redundant with some existing code and what you (@dpo) want to achieve (i.e. A.matvec(x, y) ). We did (slightly) better than SciPy for CSC with that implementation but it is not worth it.
I'm using memset
though as it is quicker than a loop (of course).
For the record:
name | rank | runs | mean | sd | timesBaseline |
---|---|---|---|---|---|
cysparse2 | 1 | 100 | 2.356e-05 | 1.954e-06 | 1.0 |
cysparse | 2 | 100 | 2.42e-05 | 3.323e-06 | 1.02732240437 |
scipy sparse2 | 3 | 100 | 2.522e-05 | 4.87e-06 | 1.07053228091 |
scipy sparse | 4 | 100 | 2.626e-05 | 2.51e-06 | 1.11455171018 |
name | rank | runs | mean | sd | timesBaseline |
---|---|---|---|---|---|
cysparse2 | 1 | 100 | 0.0001921 | 8.291e-06 | 1.0 |
cysparse | 2 | 100 | 0.0001968 | 4.033e-05 | 1.02422707922 |
scipy sparse | 3 | 100 | 0.0001982 | 2.345e-05 | 1.03203385834 |
scipy sparse2 | 4 | 100 | 0.0001997 | 3.779e-05 | 1.03960482059 |
name | rank | runs | mean | sd | timesBaseline |
---|---|---|---|---|---|
scipy sparse2 | 1 | 100 | 0.00448 | 8.045e-05 | 1.0 |
scipy sparse | 2 | 100 | 0.004498 | 0.0001532 | 1.00396344773 |
cysparse2 | 3 | 100 | 0.004884 | 0.0002421 | 1.09010723872 |
cysparse | 4 | 100 | 0.004888 | 0.0002864 | 1.09090599477 |
name | rank | runs | mean | sd | timesBaseline |
---|---|---|---|---|---|
cysparse2 | 1 | 100 | 0.001559 | 6.356e-05 | 1.0 |
scipy sparse2 | 2 | 100 | 0.001559 | 7.308e-05 | 1.00012998989 |
cysparse | 3 | 100 | 0.001583 | 0.0001955 | 1.01551926374 |
scipy sparse | 4 | 100 | 0.001592 | 0.0002038 | 1.02095589978 |
Each of the above 1600 runs were run in random, non-consecutive order by
benchmark
v0.1.5 (http://jspi.es/benchmark) with Python 2.7.5+
Linux-3.11.0-12-generic-x86_64 on 2016-03-04 11:26:58.
Actually, we weren't even better than SciPy! ;-)
Current code:
name | rank | runs | mean | sd | timesBaseline |
---|---|---|---|---|---|
scipy sparse2 | 1 | 100 | 2.508e-05 | 4.849e-06 | 1.0 |
cysparse2 | 2 | 100 | 2.543e-05 | 4.174e-06 | 1.01397604107 |
cysparse | 3 | 100 | 2.579e-05 | 2.631e-06 | 1.02842745769 |
scipy sparse | 4 | 100 | 2.62e-05 | 3.498e-06 | 1.04459022628 |
name | rank | runs | mean | sd | timesBaseline |
---|---|---|---|---|---|
scipy sparse | 1 | 100 | 0.0001946 | 9.274e-06 | 1.0 |
scipy sparse2 | 2 | 100 | 0.0001959 | 1.385e-05 | 1.00668855351 |
cysparse | 3 | 100 | 0.0002143 | 9.418e-06 | 1.10098980792 |
cysparse2 | 4 | 100 | 0.000224 | 4.933e-05 | 1.15109270874 |
name | rank | runs | mean | sd | timesBaseline |
---|---|---|---|---|---|
scipy sparse2 | 1 | 100 | 0.004495 | 9.861e-05 | 1.0 |
scipy sparse | 2 | 100 | 0.004515 | 0.0002076 | 1.00458403417 |
cysparse | 3 | 100 | 0.004884 | 0.0001992 | 1.08651210801 |
cysparse2 | 4 | 100 | 0.004899 | 0.0002066 | 1.08996419022 |
name | rank | runs | mean | sd | timesBaseline |
---|---|---|---|---|---|
cysparse2 | 1 | 100 | 0.001577 | 4.566e-05 | 1.0 |
cysparse | 2 | 100 | 0.001583 | 4.705e-05 | 1.00427014407 |
scipy sparse2 | 3 | 100 | 0.001593 | 0.0001809 | 1.01053927202 |
scipy sparse | 4 | 100 | 0.001607 | 0.0001678 | 1.01915667679 |
Each of the above 1600 runs were run in random, non-consecutive order by
benchmark
v0.1.5 (http://jspi.es/benchmark) with Python 2.7.5+
Linux-3.11.0-12-generic-x86_64 on 2016-03-04 12:23:24.
Just got strange segfault... Backtracking completely to previous code with loops. Not very "efficient" but stable.
Bottom line: the bottleneck is the init of the y
array. I don't think it is worth to change that.
The bottleneck is with the init of the
y
array...