Closed dbehrouz closed 7 years ago
----------------------- REVIEW 2 --------------------- PAPER: 658 TITLE: A Novel Approach to Continuous Training of Large Scale Machine Learning Models AUTHORS: Behrouz Derakhshan, Tilmann Rabl and Volker Markl
Overall evaluation: 1 Reviewer's confidence: 2 Originality: 4 Correctness: 3 Completeness: 2 Best paper award: no
----------- Strong Points ----------- (+) The paper presents a new solution to a known problem.
(+) The paper is clearly written.
(+) THe experiments are solid.
----------- Weak Points ----------- (-) THe innovation of this work is marginal and can be further improved.
(-) Although the paper is written clearly, there are some room for further improvement on presentation.
(-) The proposed solution can benefit from a more in-depth examination of sampling and scheduling rate parameters and their impacts on overall model quality.
----------- Review ----------- This paper presents a new solution to improve upon work done in the area of deployment strategies for machine learning models which are based on stochastic gradient descent.
In general, this work most closely relates to the previous work in [7] which proposes the Velox system to address this problem. The primary difference between this work and Velox is, after an initial machine learning model is deployed, the Velox solution will continuously update the model until the quality drops below a pre-determined threshold, at which point a complete re-training of the model is performed. The authors propose, however, that instead of. complete re-training, a selective continuous retraining is proposed such that the SGD iterations should be performed with a subset of the available data with a bias towards recently received training examples. The results from these scheduled iterations are then integrated in with the currently stored model.
This paper is clearly written, but a thorough proof reading would enable the authors to increase the quality of the write up with minimum effort.
The experimentation is thorough and well explained using two well known datasets which represent different types of data distributions, thus highlighting the performance of their proposed system under multiple circumstances.
The innovation of this work could be improved and expanded if theauthors had included some of the smart parameter tuning techniques which they introduced in the Section 5.
In Section 5.4 the authors write, "Based on our findings, we conclude that increasing the sampling and scheduling rate does not always have an effect on the quality." This claim stops short of providing interesting insight on the impact of these parameters. I would like to the authors to discuss about the effect of the data distribution on the model quality due to the sampling and scheduling rate selection.
Relatively small improvements are seen with this technique wrt model quality, especially when you consider the results from the MNIST dataset, but improvements are consistent wrt to performance. Perhaps the emphasis in this work should be that there is a more efficient solution to the model deployment problem that does not require a sacrifice in model quality. Though, it is possible that a more thorough examination of sampling and scheduling rate parameters would cause the proposed solution to show more consistent improvement wrt model quality as well.
Again, in Section 5.5 the authors claim the difference between neural networks and matrix factorization techniques provide an explanation as to the different error rate trends that are seen between the two different experimental datasets. Wouldn't the data distribution be a much stronger reason, or at least a contributing reason, for the results being so different rather than just the ML technique?