Closed dselivanov closed 7 years ago
FYI - https://github.com/dselivanov/FM only sgd with adagrad. But very fast - simd vectorized with hogwild-style async updates.
hogwild! is fast for very sparse data but dense data, right?
Thats true, hogwild! make sense only for sparse data. On dense data race conditions will negatively affect convergence and memory access latency.
But FM itself useful only with sparse data. I'm almost sure trees and multilayer perceptron will outperform FM on dense data. But even if I would implement FM for dense data, I would use dense linear algebra operations and mini-batch sgd instead of vanilla sgd, so operations will be parallelized via BLAS matrix operations.
I'm not sure that I'm 100% right, but seems I have an idea why FTRL/TDAP solvers are very slow compared to vanilla SGD. Here you are updating all model parameter for a given sample. However this is huge overkill. We need to update only parameter which corresponds to non-zero entries in input. So for each sample need to update only parameters which corresponds to columns with non-zero values.