geffy / tffm

TensorFlow implementation of an arbitrary order Factorization Machine
MIT License
780 stars 176 forks source link

Scaling benchmarks #8

Open benmccann opened 8 years ago

benmccann commented 8 years ago

I've been looking at Spark implementations of Factorization Machines. I found that none of the existing open source implementations scale to a dataset with millions of features and hundreds of millions of examples. I'd be curious how this implementation is able to scale.

geffy commented 8 years ago

Hi @benmccann, I believe you should check https://github.com/dmlc/difacto -- from my point of view, it is the most scalable solution. Btw, FFM (https://www.csie.ntu.edu.tw/~cjlin/libffm/) is a good pure C++ implementation which I've been able to run on my laptop on dataset with ~10k features (25 non-zeros) and ~30kk samples

tffm is mostly for research purpose, so I don't expect really good scalability

kopopt commented 8 years ago

@geffy @benmccann These days I was learning tensorflow, and developed a distributed factorization machine version. I customized some operators such that it has comparable performance with difacto. Welcome to take a look and give some suggestion :) Thanks.

https://github.com/kopopt/fast_tffm

arita37 commented 6 years ago

I might be able to test; just need to convert crteo into tffm input format. Is there any reference for the input format ?