Closed ProfHercules closed 2 years ago
@ProfHercules Thank you so much for your dedication and for taking the time to submit a PR. I'm sorry I couldn't get to it sooner.
Since you submitted, I've made a bunch of modifications on the master
branch. In master
I have removed some old code relying on six
, which is where a bunch of Python interaction were coming from when using range()
in Cython. Also I enabled by default a bunch of compiler directive like the ones you were using here (bouncheck=False, etc.). I also updated the CI so that it can run small benchmarks when submitting a PR (it's the "Benchmark / build (3.9) (pull_request)" below).
Before changes on master
: https://github.com/NicolasHug/Surprise/pull/422
Movielens 100k | RMSE | MAE | Time |
---|---|---|---|
SVD | 0.936 | 0.738 | 0:00:30 |
SVD++ | 0.922 | 0.723 | 0:16:16 |
NMF | 0.964 | 0.758 | 0:00:30 |
After changes on master https://github.com/NicolasHug/Surprise/pull/421 |
Movielens 100k | RMSE | MAE | Time |
---|---|---|---|---|
SVD | 0.936 | 0.738 | 0:00:07 | |
SVD++ | 0.922 | 0.723 | 0:05:24 | |
NMF | 0.964 | 0.758 | 0:00:09 |
This PR:
Movielens 100k | RMSE | MAE | Time |
---|---|---|---|
SVD | 0.936 | 0.738 | 0:00:05 |
SVD++ cache_ratings=False | 0.922 | 0.723 | 0:01:54 |
SVD++ cache_ratings=True | 0.921 | 0.722 | 0:01:38 |
NMF | 0.965 | 0.759 | 0:00:08 |
I've also made a bunch of changes to this PR. Some of them are cosmetic, some of them are maybe more relevant:
sqrt(Iu_length)
as I didn't notice any significant performance hit, and it saves of bit of memoryrange()
calls since they don't generate interactions anymore (note: to match the Cython options from setup.py
we need to use -X wraparound=False,boundscheck=False,cdivision=True,language_level=3
now).map
to just use a vector[vector[int]]
but in the end a good old malloced int **
could work just as well here, so I went for that. Hopefully I didn't introduce any memory leak. n_ratings
ints (the Iu
s), so it can be a bit expensive in terms of memory. So I implemented two version: cache_ratings=True
which corresponds to your strategy, where all ratings are cached, and cache_ratings=False
where we still retrieve Iu
on the fly, but in a much more efficient data-structure and with fewer Python interations. Both are a lot faster than master
, and cache_ratings=True
is still a bit faster thancache_ratings=False
- for a higher memory footprint.On the benchmark above we observe similar speedups as you did: 6X for SVD and ~10X for SVDpp.
Merging now, thanks a lot again for your PR @ProfHercules , this is a very nice improvement!
All changes are documented more clearly in the commits. Before starting to make changes, I ran all tests using Python 3.9, they all passed. After all changes, all the tests still pass.
Initially, on my M1 MacBook Air,
examples/bench_mf.py
produced the following:After all changes, the run time looks like this:
Which is a 6x improvement in speed for
SVD
and a 13.38x improvement forSVDpp
.I also ran the same benchmark using the MovieLens 1M dataset, and got the following:
The primary differences between this PR and #400 is
Note: I documented the process here, feel free to give it a read!