benfred / implicit

Fast Python Collaborative Filtering for Implicit Feedback Datasets
https://benfred.github.io/implicit/
MIT License
3.57k stars 612 forks source link

change the type of u to prevent illegal memory access #722

Open jmorlock opened 1 month ago

jmorlock commented 1 month ago

I was applying implicit's AlternatingLeastSquares with CUDA on a matrix of shape (4091537, 5256984). The number of factors was set to 1011.

In doing so, AlternatingLeastSquares was failing with the following traceback:

Traceback (most recent call last):
  File "/path/to/my/project/demo.py", line 18, in <module>
    model.fit(customer_product_matrix)
  File "/path/to/my/venv/lib/python3.9/site-packages/implicit/gpu/als.py", line 162, in fit
    self.solver.least_squares(Ciu, Y, _XtX, X, self.cg_steps)
  File "_cuda.pyx", line 258, in implicit.gpu._cuda.LeastSquaresSolver.least_squares
RuntimeError: Cuda Error: an illegal memory access was encountered (/project/implicit/gpu/als.cu:196)
terminate called after throwing an instance of 'std::runtime_error'
  what():  Cuda Error: an illegal memory access was encountered (/project/implicit/gpu/matrix.cu:246)
Aborted (core dumped)

The last line in the stack trace is a follow-up error referring to the destructor of CSRMatrix.

I tracked the original error down to the CUDA kernel. To be precise, to the lines 38 and 39 in als.cu (referring to implicit version 0.7.2):

  for (int u = blockIdx.x; u < user_count; u += gridDim.x) {
    T *x = &X[u * factors];

Both u and factors are of type int and u may go up to user_count-1. Considering the aforementioned matrix dimensions, there are values for u where this product exceeds the maximum value for integers (2147483647) in C++ leading to a negative array index.

By using size_t as type for u, this can be prevented.

Side note: it is remarkable that the second call to least_squares (line 162 in als.py) is failing even though the first call (line 159 in als.py) already produces negative array indexes.