cmu-db / dbgym

An end-to-end research vehicle for studying self-driving DBMSs.
MIT License
9 stars 1 forks source link

feat: add support for training an embedding with Proto-X. #2

Closed wangpatrick57 closed 5 months ago

wangpatrick57 commented 5 months ago

This PR adds functionality for using Proto-X to build an embedding from training data. It assumes various dependencies that will land in subsequent PRs (e.g., training data, workload).

Example invocation:

python task.py --no-startup-check protox embedding train tpch --iterations-per-epoch 1 --num-samples 2

Summary: Embedding training (step 2 of the "embedding" stage of Proto-X), now works without crashing using only config files inside the dbgym repo and dependencies from the dbgym_workspace directory.

Demo: pat_test.sh does a fast run without crashing (see video). Note that pat_test.sh only contains a single invocation of task.py. After the run, configs, dependencies, and results automatically appear in the run_*/ folder (see image).

https://github.com/cmu-db/dbgym/assets/20631215/0b33f4a4-eb2e-479e-b206-dee421fb7e63

Screenshot 2024-02-26 at 09 32 27

Details: