Closed benfred closed 1 year ago
My current theory is that this is because because of nogil functions that aren't marked noexcept (like in https://cython.readthedocs.io/en/latest/src/userguide/migrating_to_cy30.html#exception-values-and-noexcept - )
Looks like this is from the noexcept
issue.
Adding a line with #cython: legacy_implicit_noexcept=True
to implicit/cpu/_als.pyx
causes the training time on ml100k to drop down to be the same as with Cython v0.29.
It does seem that marking each of the functions as noexcept nogil
isn't sufficient (for instance, by applying this change):
--- a/implicit/cpu/_als.pyx
+++ b/implicit/cpu/_als.pyx
@@ -15,40 +15,40 @@ from libc.string cimport memcpy, memset
# lapack/blas wrappers for cython fused types
cdef inline void axpy(int * n, floating * da, floating * dx, int * incx, floating * dy,
- int * incy) nogil:
+ int * incy) noexcept nogil:
if floating is double:
cython_blas.daxpy(n, da, dx, incx, dy, incy)
else:
cython_blas.saxpy(n, da, dx, incx, dy, incy)
cdef inline void symv(char *uplo, int *n, floating *alpha, floating *a, int *lda, floating *x,
- int *incx, floating *beta, floating *y, int *incy) nogil:
+ int *incx, floating *beta, floating *y, int *incy) noexcept nogil:
if floating is double:
cython_blas.dsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
else:
cython_blas.ssymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
-cdef inline floating dot(int *n, floating *sx, int *incx, floating *sy, int *incy) nogil:
+cdef inline floating dot(int *n, floating *sx, int *incx, floating *sy, int *incy) noexcept nogil:
if floating is double:
return cython_blas.ddot(n, sx, incx, sy, incy)
else:
return cython_blas.sdot(n, sx, incx, sy, incy)
-cdef inline void scal(int *n, floating *sa, floating *sx, int *incx) nogil:
+cdef inline void scal(int *n, floating *sa, floating *sx, int *incx) noexcept nogil:
if floating is double:
cython_blas.dscal(n, sa, sx, incx)
else:
cython_blas.sscal(n, sa, sx, incx)
cdef inline void posv(char * u, int * n, int * nrhs, floating * a, int * lda, floating * b,
- int * ldb, int * info) nogil:
+ int * ldb, int * info) noexcept nogil:
if floating is double:
cython_lapack.dposv(u, n, nrhs, a, lda, b, ldb, info)
else:
cython_lapack.sposv(u, n, nrhs, a, lda, b, ldb, info)
cdef inline void gesv(int * n, int * nrhs, floating * a, int * lda, int * piv, floating * b,
- int * ldb, int * info) nogil:
+ int * ldb, int * info) noexcept nogil:
if floating is double:
cython_lapack.dgesv(n, nrhs, a, lda, piv, b, ldb, info)
else:
Applying this change only reduced training time on the ml100k dataset from 121s to 63s.
However there were a bunch of warnings like
warning: scipy/linalg/cython_blas.pxd:20:72: Implicit noexcept declaration is deprecated. Function declaration should contain 'noexcept' keyword.
From the cython_blas / cython_lapack functions we are using . This is probably because I'm compiling with scipy v1.10.1 - which doesn't have this fix applied https://github.com/scikit-learn/scikit-learn/issues/25609.
Cython 3.0.0 has recently been released - and we're seeing this cause a large performance regression over Cython 0.29.21
Running the movielens 100k example (
python examples/movielens.py --variant=100k
) and the training component takes 121s when compiled with Cython 3.0.0, but only takes 1.9s with Cython 0.29.21 (63x slower). The inference performance seems to be roughly similar here.Running the ALS unittests (
pytest tests/als_test.py
) with Cython 3.0.0 takes 589s where it only takes 28s with Cython 0.29.21 (21x slower).