TU-Berlin-DIMA / continuous-pipeline-deployment

Repository for continuous deployment of ML pipelines (latex files + code)
2 stars 3 forks source link

Improve Machine Learning details #16

Closed dbehrouz closed 6 years ago

dbehrouz commented 7 years ago

Several issues related to the machine learning details have been raised by the reviewers. We have to address the following:

dbehrouz commented 7 years ago

Both [1,2] are specifying the per coordinate learning rate is the best approach in online scenarios. Implementation of it should not be an issue in spark

[1] He, Xinran, et al. "Practical lessons from predicting clicks on ads at facebook." Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. ACM, 2014.

[2] McMahan, H. Brendan, et al. "Ad click prediction: a view from the trenches." Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013.

dbehrouz commented 7 years ago

Will create a new issue for implementing the per coordinate learning rate

dbehrouz commented 7 years ago

[2] states that:

Standard methods would assess the confidence of predictions of a fully-converged batch model without regularization; our models are online, do not assume IID data (so convergence is not even well defined), and heavily regularized.

Does it mean convergence is not defined for online models? How should I tackle this? Ask Sebastian Schelter .