benfred / implicit

Fast Python Collaborative Filtering for Implicit Feedback Datasets
https://benfred.github.io/implicit/
MIT License
3.57k stars 612 forks source link

Allow GPU models to train on sparse matrices that exceed the size of available GPU memory #605

Closed benfred closed 2 years ago

benfred commented 2 years ago

Use CUDA Unified Virtual Memory for sparse matrices on the GPU. This allows GPU models to train on input sparse matrices that exceed the size of GPU memory, by letting cuda page data to/from host memory using UVM.

This has been tested on a ALS model with around 2B entries in the spare matrix, on a GPU with 16GB of memory. Previously this OOM'ed since we need around 32GB of GPU memory to store the sparse matrix and its transpose, but with this change training succeeded - and was around 20x faster on the GPU than on the CPU.