This paper proposes a deploy method to continuously updates the model to combine the model training and model serving. The main purpose of this paper is to guarantee the freshness of model (always deploying updated model) and the model quality (using SGD to update model every mini-batch) at the same time. However, using SGD or its variants to perform online training has already been deployed by many industrial companies, such as Google [1]. The authors should compare their method with FTRL and other online learning algorithms.
[1] Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization.
Strong points
S1: Online learning has been widely deployed by industrial companies for CTR prediction. How to balance the model quality and the model freshness is an important problem.
S2: The authors clearly described the process and requirement of online advertising.
Weak points
W1: The proposed method which employs SGD to combine model training and model serving is not new. SGD and its variant have already been utilized by existing industrial companies to perform online learning.
W2: The experiments are not convincing. The authors only compared their method with periodical training, which trains a new model once every day. The authors should compare their method with other online learning algorithms for Logistic Regression, such as FTRL. Meanwhile, incremental learning should also be compared.
W3: The second contribution of this paper is also too weak. While performing online training, preprocessing phases, such as labeling, one-hot phase, and feature scaling, are naturally accomplished in real-time.
Detailed Evaluation
D1: The authors should simulate a more long time to evaluate the quality of the model and test the model with more data. The authors only simulated two days of data and tested the model with one day's data. This is unreasonable in real applications. Moreover, the authors should use AUC value to test the quality rather loss value.
D2: In sec 4.7, I doubt why continuous training can outperform periodical training by 5x? In the periodical training, all things are started from scratch. But there are some optimizations for periodical training. For example, we can conduct incremental training with a pre-trained model and new data.
D3: Why the materialization can further accelerate the performance by 18x? I think it requires an extra disk step which will degrade the speed.
D4: In FTRL, subsampling is employed to skip a fraction of negative samples. What is your sampling methods compared with?
To Do
[ ] Read Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization.
[ ] Compare with other methods as well (FTRL) and other Incremental learning approaches
[ ] The length of the simulation should be increased and use better test set
[ ] Investigate the appropriateness of other metrics for testing the quality (AUC maybe)
[ ] Investigate better sampling techniques (Refer to FTRL)
Summary
This paper proposes a deploy method to continuously updates the model to combine the model training and model serving. The main purpose of this paper is to guarantee the freshness of model (always deploying updated model) and the model quality (using SGD to update model every mini-batch) at the same time. However, using SGD or its variants to perform online training has already been deployed by many industrial companies, such as Google [1]. The authors should compare their method with FTRL and other online learning algorithms. [1] Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization.
Strong points
S1: Online learning has been widely deployed by industrial companies for CTR prediction. How to balance the model quality and the model freshness is an important problem.
S2: The authors clearly described the process and requirement of online advertising.
Weak points
W1: The proposed method which employs SGD to combine model training and model serving is not new. SGD and its variant have already been utilized by existing industrial companies to perform online learning.
W2: The experiments are not convincing. The authors only compared their method with periodical training, which trains a new model once every day. The authors should compare their method with other online learning algorithms for Logistic Regression, such as FTRL. Meanwhile, incremental learning should also be compared.
W3: The second contribution of this paper is also too weak. While performing online training, preprocessing phases, such as labeling, one-hot phase, and feature scaling, are naturally accomplished in real-time.
Detailed Evaluation
D1: The authors should simulate a more long time to evaluate the quality of the model and test the model with more data. The authors only simulated two days of data and tested the model with one day's data. This is unreasonable in real applications. Moreover, the authors should use AUC value to test the quality rather loss value.
D2: In sec 4.7, I doubt why continuous training can outperform periodical training by 5x? In the periodical training, all things are started from scratch. But there are some optimizations for periodical training. For example, we can conduct incremental training with a pre-trained model and new data.
D3: Why the materialization can further accelerate the performance by 18x? I think it requires an extra disk step which will degrade the speed.
D4: In FTRL, subsampling is employed to skip a fraction of negative samples. What is your sampling methods compared with?
To Do