aksnzhy / xlearn

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
https://xlearn-doc.readthedocs.io/en/latest/index.html
Apache License 2.0
3.09k stars 519 forks source link

Different results with same seed #340

Closed ankane closed 2 years ago

ankane commented 4 years ago

Hi, creating models with the same seed in the same program generates different training loss and predictions.

import xlearn as xl

for i in range(10):
  model = xl.FMModel(task="reg", seed=1, nthread=1, opt="adagrad")
  model.fit("data.csv")
  print(i, model.predict("data.csv")[:5].tolist())

The results are consistent between each run, but not consistent in the same run.

0 [1.026936650276184, 1.0957672595977783, 1.0505752563476562, 1.1147269010543823, 1.0406986474990845]
1 [0.9017000198364258, 0.9972414374351501, 0.9316112399101257, 0.9791277647018433, 0.9119905829429626]
2 [0.9654833674430847, 1.0520145893096924, 0.9934350252151489, 1.0509999990463257, 0.9800773859024048]
3 [0.9456406235694885, 1.0239505767822266, 0.9725252389907837, 1.022987961769104, 0.9517906308174133]
4 [0.9248837828636169, 1.0065243244171143, 0.9512857794761658, 0.9991171956062317, 0.9310711622238159]
5 [0.8975953459739685, 0.9884685277938843, 0.9281622171401978, 0.9772773385047913, 0.9120030999183655]
6 [0.9696987867355347, 1.0457327365875244, 0.9946709871292114, 1.0590217113494873, 1.0021713972091675]
7 [0.9622142314910889, 1.0551151037216187, 0.9889687895774841, 1.0397430658340454, 0.9631614685058594]
8 [0.9330625534057617, 1.0207873582839966, 0.9604312181472778, 1.0143687725067139, 0.9497082829475403]
9 [0.995154857635498, 1.0736563205718994, 1.0205495357513428, 1.0743745565414429, 0.9938861131668091]

Here's the CSV file if it helps to reproduce: https://gist.github.com/ankane/1604be5a95aa9536b87daa7e7c643023

ankane commented 4 years ago

I also tried with model.fit("data.csv", is_lock_free=False) since the docs mention lock-free training can be non-deterministic, but it didn't change the results.