evfro / polara

Recommender system and evaluation framework for top-n recommendations tasks that respects polarity of feedbacks. Fast, flexible and easy to use. Written in python, boosted by scientific python stack.
MIT License
252 stars 23 forks source link

Scaled svd and hybrid svd is giving non deterministic output #13

Closed consujdcgcg closed 2 years ago

consujdcgcg commented 3 years ago

Hi @evfro ,

In this blog post https://www.eigentheories.com/blog/lightfm-vs-hybridsvd/, it is mentioned that SVD from polara has deterministic output but each run in my pipeline is giving me different outputs. I am using hybrid svd and I am being careful with every seed and random_state instantiation but still the issue is persisting. How can I achieve the same output for different runs?

evfro commented 3 years ago

@consujdcgcg hi, could you please provide more details with code? Note, that determinism comes from the SVD computation (implemented by scipy), not from polara framework. If you observe problems, most likely something is wrong in your data setup. But anyway, without looking into code it's hard to tell what is the source of your problem.

rgrosskopf commented 3 years ago

I've had a similar problem with, for example, the Comparing LightFM with HybridSVD.ipynb is returning NaNs and non-sensical precision values from tuning. conda list output is below. I've also tried a Python 3.8 environment with similar results.

OS is a Ubuntu 20.04 docker image from Jupyter stacks running in a Mac OS host environment.

This:

print(f'The best value of {target_metric}={svd_scores.max():.4f} was achieved with '
      f'rank={svd_best_config["rank"]} and scaling parameter={svd_best_config["col_scaling"]}.')

Returns: The best value of precision=6221039324650766998772673322609708242190208693316706175944965817336080261590842579394450104503361595226760882161744806515738280386485306496943103715345795769348743453372078174231822260937898055726946448446613113937469607106741712821712360503221412380827016935069738220772234570650310161023058314860167168.0000 was achieved with rank=200 and scaling parameter=0.2.


# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
alembic                   1.5.2              pyhd8ed1ab_0    conda-forge
anyio                     2.0.2            py36h5fab9bb_4    conda-forge
argon2-cffi               20.1.0           py36h8f6f2f9_2    conda-forge
async_generator           1.10                       py_0    conda-forge
attrs                     20.3.0             pyhd3deb0d_0    conda-forge
babel                     2.9.0              pyhd3deb0d_0    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.1                      py_0    conda-forge
bleach                    3.2.2              pyh44b312d_0    conda-forge
brotlipy                  0.7.0           py36h8f6f2f9_1001    conda-forge
ca-certificates           2020.12.5            ha878542_0    conda-forge
certifi                   2020.12.5        py36h5fab9bb_1    conda-forge
cffi                      1.14.4           py36hc120d54_1    conda-forge
chardet                   4.0.0            py36h5fab9bb_1    conda-forge
cliff                     3.6.0              pyhd8ed1ab_0    conda-forge
cmaes                     0.7.0              pyhac0dd68_0    conda-forge
cmd2                      0.9.22           py36h9f0ad1d_1    conda-forge
colorama                  0.4.4              pyh9f0ad1d_0    conda-forge
colorlog                  4.7.2            py36h5fab9bb_0    conda-forge
contextvars               2.4                        py_0    conda-forge
cryptography              3.3.1            py36h0a59100_1    conda-forge
cycler                    0.10.0                     py_2    conda-forge
dataclasses               0.7                pyhe4b4509_6    conda-forge
decorator                 4.4.2                      py_0    conda-forge
defusedxml                0.6.0                      py_0    conda-forge
entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
icu                       67.1                 he1b5a44_0    conda-forge
idna                      2.10               pyh9f0ad1d_0    conda-forge
immutables                0.14             py36h8f6f2f9_1    conda-forge
importlib-metadata        3.4.0            py36h5fab9bb_0    conda-forge
importlib_metadata        3.4.0                hd8ed1ab_0    conda-forge
ipykernel                 5.4.3            py36he448a4c_0    conda-forge
ipython                   7.12.0           py36h5ca1d4c_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.6.3              pyhd3deb0d_0    conda-forge
jedi                      0.18.0           py36h5fab9bb_2    conda-forge
jinja2                    2.11.2             pyh9f0ad1d_0    conda-forge
joblib                    1.0.0              pyhd8ed1ab_0    conda-forge
json5                     0.9.5              pyh9f0ad1d_0    conda-forge
jsonschema                3.2.0                      py_2    conda-forge
jupyter_client            6.1.11             pyhd8ed1ab_1    conda-forge
jupyter_core              4.7.0            py36h5fab9bb_1    conda-forge
jupyter_server            1.2.2            py36h5fab9bb_1    conda-forge
jupyterlab                3.0.5              pyhd8ed1ab_0    conda-forge
jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
jupyterlab_server         2.1.2              pyhd8ed1ab_0    conda-forge
jupyterlab_widgets        1.0.0              pyhd8ed1ab_1    conda-forge
kiwisolver                1.3.1            py36h605e78d_1    conda-forge
ld_impl_linux-64          2.35.1               hea4e1c9_1    conda-forge
libblas                   3.9.0                7_openblas    conda-forge
libcblas                  3.9.0                7_openblas    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
libgfortran-ng            9.3.0               hff62375_18    conda-forge
libgfortran5              9.3.0               hff62375_18    conda-forge
libgomp                   9.3.0               h2828fa1_18    conda-forge
liblapack                 3.9.0                7_openblas    conda-forge
libllvm10                 10.0.1               he513fc3_3    conda-forge
libopenblas               0.3.12          pthreads_h4812303_1    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
lightfm                   1.16                     pypi_0    pypi
llvmlite                  0.35.0           py36h05121d2_1    conda-forge
mako                      1.1.4              pyh44b312d_0    conda-forge
markupsafe                1.1.1            py36h8f6f2f9_3    conda-forge
matplotlib                3.2.2                         1    conda-forge
matplotlib-base           3.2.2            py36h5fdd944_1    conda-forge
metis                     5.1.0             h58526e2_1006    conda-forge
mistune                   0.8.4           py36h8f6f2f9_1003    conda-forge
nbclassic                 0.2.6              pyhd8ed1ab_0    conda-forge
nbclient                  0.5.1                      py_0    conda-forge
nbconvert                 6.0.7            py36h5fab9bb_3    conda-forge
nbformat                  5.1.2              pyhd8ed1ab_1    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
nest-asyncio              1.4.3              pyhd8ed1ab_0    conda-forge
notebook                  6.2.0            py36h5fab9bb_0    conda-forge
numba                     0.52.0           py36h284efc9_0    conda-forge
numpy                     1.19.5           py36h2aa4a07_1    conda-forge
openssl                   1.1.1i               h7f98852_0    conda-forge
optuna                    2.4.0              pyhd8ed1ab_0    conda-forge
packaging                 20.8               pyhd3deb0d_0    conda-forge
pandas                    1.1.5            py36h284efc9_0    conda-forge
pandoc                    2.11.3.2             h7f98852_0    conda-forge
pandocfilters             1.4.2                      py_1    conda-forge
parso                     0.8.1              pyhd8ed1ab_0    conda-forge
pbr                       5.5.1              pyh9f0ad1d_0    conda-forge
pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pip                       21.0               pyhd8ed1ab_0    conda-forge
polara                    0.7.2                    pypi_0    pypi
prettytable               2.0.0              pyhd8ed1ab_0    conda-forge
prometheus_client         0.9.0              pyhd3deb0d_0    conda-forge
prompt-toolkit            3.0.13             pyha770c72_0    conda-forge
prompt_toolkit            3.0.13               hd8ed1ab_0    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pygments                  2.7.4              pyhd8ed1ab_0    conda-forge
pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyperclip                 1.8.1              pyhd3deb0d_0    conda-forge
pyrsistent                0.17.3           py36h8f6f2f9_2    conda-forge
pysocks                   1.7.1            py36h5fab9bb_3    conda-forge
python                    3.6.12          hffdb5ce_0_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python-editor             1.0.4                      py_0    conda-forge
python_abi                3.6                     1_cp36m    conda-forge
pytz                      2020.5             pyhd8ed1ab_0    conda-forge
pyyaml                    5.4.1            py36h8f6f2f9_0    conda-forge
pyzmq                     20.0.0           py36h81c33ee_1    conda-forge
readline                  8.0                  he28a2e2_2    conda-forge
requests                  2.25.1             pyhd3deb0d_0    conda-forge
scikit-learn              0.24.1                   pypi_0    pypi
scikit-sparse             0.4.4           py36hd282510_1004    conda-forge
scipy                     1.5.3            py36h9e8f40b_0    conda-forge
send2trash                1.5.0                      py_0    conda-forge
setuptools                49.6.0           py36h5fab9bb_3    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
sniffio                   1.2.0            py36h5fab9bb_1    conda-forge
sqlalchemy                1.3.22           py36h8f6f2f9_1    conda-forge
sqlite                    3.34.0               h74cdb3f_0    conda-forge
stevedore                 3.3.0            py36h5fab9bb_1    conda-forge
suitesparse               5.7.2                h717dc36_0    conda-forge
tbb                       2020.2               h4bd325d_3    conda-forge
terminado                 0.9.2            py36h5fab9bb_0    conda-forge
testpath                  0.4.4                      py_0    conda-forge
threadpoolctl             2.1.0                    pypi_0    pypi
tk                        8.6.10               h21135ba_1    conda-forge
tornado                   6.1              py36h8f6f2f9_1    conda-forge
tqdm                      4.56.0             pyhd8ed1ab_0    conda-forge
traitlets                 4.3.3            py36h9f0ad1d_1    conda-forge
typing_extensions         3.7.4.3                    py_0    conda-forge
urllib3                   1.26.2             pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.36.2             pyhd3deb0d_0    conda-forge
widgetsnbextension        3.5.1            py36h5fab9bb_4    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
yaml                      0.2.5                h516909a_0    conda-forge
zeromq                    4.3.3                h58526e2_3    conda-forge
zipp                      3.4.0                      py_0    conda-forge
zlib                      1.2.11            h516909a_1010    conda-forge```
evfro commented 3 years ago

@rgrosskopf thanks for the detailed report and sorry for long waiting time. I believe I tracked the problem down and fixed it. More specifically, it was the problem with unitialized numpy array in calculation of evaluation metrics. The bug was introduced in https://github.com/evfro/polara/commit/22747227954f7dd75713875a6f6b10c703c32c60. Fixed by https://github.com/evfro/polara/commit/dc6cf9e9a9f551e46b34418d24d8772b9561ce4a.

Could you please install the latest develop version and check that you no longer experience the issue? You can simply upgrade polara by running:

pip install --no-cache-dir --upgrade git+https://github.com/Evfro/polara.git@develop#egg=polara
rgrosskopf commented 3 years ago

Works for me! (or at least I'm getting plausible results) Thanks for getting the fix in.

I'm still getting an error running the optuna tuning for LightFM (v1.16) in the Comparing LightFM with HybridSVD.ipynb demo but my main goal was to get a working starting point to compare to HybridSVD and that I have.

evfro commented 2 years ago

I'm closing the issue. Feel free to open a new one if there's still something non-working on polara side.