cmu-db / peloton

The Self-Driving Database Management System
http://pelotondb.io
Apache License 2.0
2.03k stars 623 forks source link

Add first implementation of augmentedNN to predict selectivity #1473

Closed yetiancn closed 6 years ago

yetiancn commented 6 years ago

The model is an initial implementation to predict selectivity for range predicates. It can be applied to queries like: SELECT * FROM table WHERE c >= l AND c <= u.

I implement the model in augmentedNN.py and cpp wrapper code in augmentedNN.cpp, taking LSTM.py and LSTM.cpp as a reference. Hyperparameters, especially number of training epochs, need to be discussed based on real system experiments. Test cases for the model are also added. The test cases include a uniform distribution dataset and a skewed distribution dataset.

There are two classes defined.

  1. class AugmentedNN (in augmentedNN.cpp). This class is just like class TimeSeriesLSTM.

    • Fit(): applies backpropagation.
    • Predict(): returns the predictions for the input.
    • TrainEpoch(): trains for one epoch.
    • ValidateEpoch(): uses one epoch for validation.
  2. class TestingAugmentedNNUtil (in testing_forecast_util.cpp)

    • GetData(): generates data for training and testing. Dataset is uniform or skewed distributed.
    • Test(): calls the APIs mentioned above to train and test the model.

Btw, in testing_forecast_util.cpp, the argument of matrix_eig::bottomRows was wrong. It should be the number of rows counted from the bottom of the matrix_eig. I've modified it. Please check if I am right.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.2%) to 76.528% when pulling dc1a0753b4d1e4734e4033aea8ef87657d45f4d7 on yetiancn:master into 1fc8b5586162afb7a2f5607256abed149f74a665 on cmu-db:master.