cmu-db / dbgym

An end-to-end research vehicle for studying self-driving DBMSs.
MIT License
9 stars 1 forks source link

feat: add support for selecting the best trained embedding in Proto-X. #3

Closed wangpatrick57 closed 4 months ago

wangpatrick57 commented 5 months ago

This PR adds functionality for selecting the best trained embedding in Proto-X. It extends previous functionality under the protox embedding train module.

Example invocation:

python task.py --no-startup-check protox embedding train tpch --iterations-per-epoch 1 --num-samples 4 --train-max-concurrent 4 --num-points-to-sample 32 --max-segments 3

Summary: Embedding analysis + selection (steps 3 and 4 of the "embedding" stage of Proto-X), now works as a single program without crashing. "As a single command" is significant because this previously involved 4 Python scripts and 3 shell scripts. Now, all of these are just Python functions runnable with a single command.

Demo: pat_test.sh does a fast run without crashing (see video). Note that pat_test.sh only contains a single invocation of task.py. After the run, configs, dependencies, and results automatically appear in the run_*/ folder (see image).

https://github.com/cmu-db/dbgym/assets/20631215/b0d88243-9ada-4552-b870-d8d2c4b033fb

Screenshot 2024-02-27 at 10 02 56

Details: