btwardow / FactorizationMachines.jl

Factorization Machines for Julia
Other
11 stars 6 forks source link

Benchmarks against other implementations #6

Closed Hydrotoast closed 8 years ago

Hydrotoast commented 8 years ago

Some other implementations to compare to:

Tasks:

Experiment approach

  1. Select a dataset and split it into training X_train, y_train and test X_test, y_test
  2. Download both libraries
  3. Train both libraries on X_train, y_train (and measure the training time)
  4. Verify that the test set evaluations are close enough on X_test, y_test
  5. Repeat the test 10 times

Implementing the experiment (up for discussion/alternatives)

  1. Write Benchmark script in Bash and use simple wall clock time for measurements
  2. Save script in a new benchmarks/ folder
Hydrotoast commented 8 years ago

This task is OS-dependent since there are various methods installing the desired packages and and executing them. I suppose the best approach would be to clone the respective repositories and run the corresponding installing scripts.

Hydrotoast commented 8 years ago

A script I wrote recently for benchmarking against fastfm:

import numpy as np

from fastFM import sgd
from scipy.sparse import hstack
from sklearn.datasets import load_svmlight_file
from sklearn.metrics import mean_squared_error
from math import sqrt

X_train, y_train = load_svmlight_file("ml100k_train.txt.clean")
n_train = X_train.shape[1]
X_test, y_test = load_svmlight_file("ml100k_test.txt.clean")
m_test, n_test = X_test.shape
X_test = hstack((X_test, np.zeros((m_test, n_train - n_test), dtype=np.float)))

fm = sgd.FMRegression(n_iter=10, init_stdev=0.01, rank=4, l2_reg_w=0.0, l2_reg_V=0.0, step_size=0.1)
fm.fit(X_train, y_train)
y_pred = fm.predict(X_test)

print(sqrt(mean_squared_error(y_pred, y_test)))
btwardow commented 8 years ago

If it's OS dependent, what do You think about docerizing it?

Hydrotoast commented 8 years ago

Docker is a good idea for accurate/fair benchmarks; however, I am uncertain how much work would be required to get this working. Perhaps we can get something simple working first and move to Docker if this becomes a popular library across several platforms?

btwardow commented 8 years ago

Ok. When we have bash/pytho/julia script for running it, we are only simple step to encapsulate it inside the container.

Hydrotoast commented 8 years ago

Agreed. Time for me to review Docker.

Hydrotoast commented 8 years ago

Technically the PR for this issue has been merged, although the current implementation is slower than fastfm. I will close this issue now and open a new issue after some investigation into some slow code.