Angel-ML / angel

A Flexible and Powerful Parameter Server for large-scale machine learning
Other
6.74k stars 1.6k forks source link

convergence problems when using FTRL optimizer #351

Closed futurecam closed 5 years ago

futurecam commented 6 years ago

Hi, Angels,

We are using angel on spark for large scale wide model training, to serve for CTR prediction, so the feature is very sparse and the objective function is logloss When we shift optimizer to FTRL(before is LBFGS), there exists some convergence problem: 1) training the same dataset with the same parameters two times, we get different result for logloss and auc, and one is diverge at the first epoch;There is no data random shuffle when training and we did not set angel.staleness so I think the default value is 0(BSP) 2) training the same dataset twice but one is using part of features, the one using part of features cannot converge, turning the parameter cannot help

Because of all the case above can converge well when using LBFGS, I think maybe the convergence of FTRL is sensitive to data in distribute learning. I don't know convergence guarantee of ps for FTRL is sufficient or not when Z/N is calculate at worker and update at server? Or we need some other mechanism like trust region method to avoid diverge? Can anyone help?

The FTRL optimizer we use is Async mini-batch FTRL : for ( epochId <- 0 until epoch) { val tempRDD = trainSet.mapPartitions { iter => iter.toArray.sliding(batchSize, batchSize) .map { batch => // ftrl.optimize会更新模型参数 Z 和 N ftrl.optimize(batch, calcGradientLoss) } } tempRDD.count() }

QueenOfGalaxy commented 6 years ago

we encountered the same problem

luxin0123 commented 6 years ago

First, thanks for your question @futurecam .We had tested the the convergence of FTRL for some data sets, and got a good and reasonable result before release it. About trainning async mini-batch FTRL, I have the following suggestions:

  1. Unlike the sync mini-batch FTRL, async is more difficult to train indeed. BatchSize param is very important to take into account, especially for some data sets whose distribution is not disciplinary. If so, you should use a small size between 500 and 5000 or even smaller. Meanwhile, the alpha and beta is also import to control the learning rate which should also be considered.
  2. As for LBFGS, we know for high dimension data sets, it's very hard to compute the hessian matrix.And for convex loss function, it can get the optimal solution as same as the FTRL. While, the FTRL-async mechanism make the train more difficult.
  3. Recently, angel has released 2.0.0 branch which support the FTRL-sync method to train and may be more easily to convergence . We are sincerely to invite you to use it and bring the feedbak.