Scaled svd and hybrid svd is giving non deterministic output #13

Closed consujdcgcg closed 2 years ago

consujdcgcg commented 3 years ago

Hi @evfro ,

In this blog post https://www.eigentheories.com/blog/lightfm-vs-hybridsvd/, it is mentioned that SVD from polara has deterministic output but each run in my pipeline is giving me different outputs. I am using hybrid svd and I am being careful with every seed and random_state instantiation but still the issue is persisting. How can I achieve the same output for different runs?

evfro commented 3 years ago

@consujdcgcg hi, could you please provide more details with code? Note, that determinism comes from the SVD computation (implemented by scipy), not from polara framework. If you observe problems, most likely something is wrong in your data setup. But anyway, without looking into code it's hard to tell what is the source of your problem.

rgrosskopf commented 3 years ago

I've had a similar problem with, for example, the Comparing LightFM with HybridSVD.ipynb is returning NaNs and non-sensical precision values from tuning. conda list output is below. I've also tried a Python 3.8 environment with similar results.

OS is a Ubuntu 20.04 docker image from Jupyter stacks running in a Mac OS host environment.


print(f'The best value of {target_metric}={svd_scores.max():.4f} was achieved with '
      f'rank={svd_best_config["rank"]} and scaling parameter={svd_best_config["col_scaling"]}.')

Returns: The best value of precision=6221039324650766998772673322609708242190208693316706175944965817336080261590842579394450104503361595226760882161744806515738280386485306496943103715345795769348743453372078174231822260937898055726946448446613113937469607106741712821712360503221412380827016935069738220772234570650310161023058314860167168.0000 was achieved with rank=200 and scaling parameter=0.2.

evfro commented 3 years ago

@rgrosskopf thanks for the detailed report and sorry for long waiting time. I believe I tracked the problem down and fixed it. More specifically, it was the problem with unitialized numpy array in calculation of evaluation metrics. The bug was introduced in https://github.com/evfro/polara/commit/22747227954f7dd75713875a6f6b10c703c32c60. Fixed by https://github.com/evfro/polara/commit/dc6cf9e9a9f551e46b34418d24d8772b9561ce4a.

Could you please install the latest develop version and check that you no longer experience the issue? You can simply upgrade polara by running:

pip install --no-cache-dir --upgrade git+https://github.com/Evfro/polara.git@develop#egg=polara
rgrosskopf commented 3 years ago

Works for me! (or at least I'm getting plausible results) Thanks for getting the fix in.

I'm still getting an error running the optuna tuning for LightFM (v1.16) in the Comparing LightFM with HybridSVD.ipynb demo but my main goal was to get a working starting point to compare to HybridSVD and that I have.

evfro commented 2 years ago

I'm closing the issue. Feel free to open a new one if there's still something non-working on polara side.