Do you use the "what-if" index for the generation of training data?

JC-Shi / Learned-Index-Benefits

13 stars 5 forks source link

Do you use the "what-if" index for the generation of training data? #1

Closed JiTao3 closed 1 year ago

JiTao3 commented 1 year ago

I noticed that the training sample in the file has been coded and processed, but did the calculation of 𝐶𝑜𝑠𝑡 (𝑞_𝑗 | 𝑐_𝑖) use the "what-if" index?

JC-Shi commented 1 year ago

Hi, for the training sample, as the model's objective is to predict the actual cost reduction ratio, "what-if" based hypothetical index is not used. For each workload, we did create the index then execute the workload 4 times. The average cost of the last three runs is then used as Cost(q_j|c_i). I hope this clear your doubt. Thank you.

JiTao3 commented 1 year ago

Thank you very much for your reply. As I understand, the training is using the latency instead of the estimated cost. Is that right? The time to create an index and actually execute the plan is much longer than using a hypothetical index? Although this leads to more precise results.

JC-Shi commented 1 year ago

Yes, the actual query execution latency is used for training. Actual index creation time and actual query execution are much more time-consuming than "what-if" based estimation, but the results are accurate. As the model is trained offline before implementation, when the trained model is adopted for index benefit estimation, the model inference time should be lesser than hypothetical index creation. Thank you.

JiTao3 commented 1 year ago

Thank you very much for your answer.