Address Reviewer 1 comments

----------------------- REVIEW 1 --------------------- PAPER: 658 TITLE: A Novel Approach to Continuous Training of Large Scale Machine Learning Models AUTHORS: Behrouz Derakhshan, Tilmann Rabl and Volker Markl

Overall evaluation: -2 Reviewer's confidence: 2 Originality: 2 Correctness: 3 Completeness: 1 Best paper award: no

----------- Strong Points -----------

This is an increasing need to deploy large scale regression and ML models in real world scenarios

----------- Weak Points ----------- 1- Unclear novelty

2- Incomplete evaluation

3- Writing needs improvement

----------- Review ----------- The paper presents a solution to the problem of timely updating the machine learning models with changing data while answering the user queries simultaneously. Authors leverage the sampling step of stochastic gradient descent algorithm by sampling both the historical data and the new data since the last update. Compared to a trivial approach of complete model retraining this paper achieves up to order of magnitude faster model update without degrading the quality of the model.

Comments

D1: I don’t even see any technical contributions. The fact that SGD can be used to update a model online is obvious. While I see this as implementation exercise, could not see new algorithmic contributions or insights in the update problem

D2: Design choices are not explained or evaluated, trade-offs not studied, some problems like distributional drift just mentioned and not explained…

D3: Experimental evaluation is weak from an ML perspective. the authors do not show the distribution drift in both the training and testing data. Testing datasets are small and I doubt one can even observe distributional changes. Evaluation performed on a single node, whereas section 4.4 talks about extending the solution to distributed environment.

D4: Sec 4.4: It is an interesting discussion related to train a model under the distributed environment. The paper stated “It is important that the model manager chooses the partitioning strategy in such a way, that when requested the predictions can be made quickly.” It is true, but how to realize it? There are some technical descriptions missing here.

D5: Presentation

The paper has multiple typos and grammatical errors. Some of the major errors: a) Section 4.4 "However, in order to ensure scalability several considerations have to be made... The data manager can work on top of distributed file systems such as HDFS." b) Section 5.2 "Naive implements a ... parameters are adjusted based on the gradient value." c) Inconsistent naming of the dataset "movie-len-100k" or "100K MovieLens" d) Section 6, first sentence. e) Section 6, "Image classifier use case ... creating much overhead." Bad sentence formation.

In the evaluation section, the comparisons are not clear from the graphs. For e.g., in figure 5 the plots for buffer size 500 and 5000 and not distinguishable, same for figure 7, 9, 10

Figure 4: In this architecture, it is not clear how the model manager interacts with the data to predict the result.

TU-Berlin-DIMA / continuous-pipeline-deployment

Address Reviewer 1 comments #9