Guess about why FTRL learner is slow compared to SGD

evanwang1990 / FMwR

FMwR: A Library of Factorization Machines in R Based on libfm

GNU General Public License v3.0

7 stars 3 forks source link

Guess about why FTRL learner is slow compared to SGD #5

Closed dselivanov closed 7 years ago

dselivanov commented 7 years ago

I'm not sure that I'm 100% right, but seems I have an idea why FTRL/TDAP solvers are very slow compared to vanilla SGD. Here you are updating all model parameter for a given sample. However this is huge overkill. We need to update only parameter which corresponds to non-zero entries in input. So for each sample need to update only parameters which corresponds to columns with non-zero values.

dselivanov commented 7 years ago

FYI - https://github.com/dselivanov/FM only sgd with adagrad. But very fast - simd vectorized with hogwild-style async updates.

evanwang1990 commented 7 years ago

hogwild! is fast for very sparse data but dense data, right?

dselivanov commented 7 years ago

Thats true, hogwild! make sense only for sparse data. On dense data race conditions will negatively affect convergence and memory access latency.

But FM itself useful only with sparse data. I'm almost sure trees and multilayer perceptron will outperform FM on dense data. But even if I would implement FM for dense data, I would use dense linear algebra operations and mini-batch sgd instead of vanilla sgd, so operations will be parallelized via BLAS matrix operations.