Eventual-Inc / Daft

Distributed data engine for Python/SQL designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
2.39k stars 170 forks source link

[FEAT] GHA workflow to perform tcph benchmarking #3184

Open raunakab opened 3 weeks ago

raunakab commented 3 weeks ago

Overview

Create new GHA workflow for building a commit and running tpch against it.

Notes

There are 2 main workflows:

  1. build-commit.yaml
  2. run-tpch.yaml

The final workflow, build-commit-run-tpch.yaml just runs the above two in a sequential order.

I've also made some changes to benchmarking/tpch/__main__.py. Namely:

  1. Added all env-vars that start with DAFT to the ray-runtime-env variables that's sent during ray-cluster initialization.
  2. Added flag to turn off sending daft module to ray-cluster during initialization.
    • No need to pickle the daft module and send it over; it's already installed on the ray-cluster from the AWS S3 link pointing to the prebuilt python-wheel.

I've summarized the workflows individually down below:

build-commit workflow

run-tpch workflow

build-commit-run-tpch workflow

codspeed-hq[bot] commented 3 weeks ago

CodSpeed Performance Report

Merging #3184 will degrade performances by 34.59%

Comparing feat/infra (e908742) with main (ec39dc0)

Summary

❌ 1 regressions
✅ 16 untouched benchmarks

:warning: Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main feat/infra Change
test_iter_rows_first_row[100 Small Files] 264.6 ms 404.6 ms -34.59%
codecov[bot] commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 76.37%. Comparing base (ec39dc0) to head (e908742).

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3184/graphs/tree.svg?width=650&height=150&src=pr&token=J430QVFE89&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc)](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3184?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc) ```diff @@ Coverage Diff @@ ## main #3184 +/- ## ========================================== - Coverage 76.54% 76.37% -0.17% ========================================== Files 685 685 Lines 85269 85135 -134 ========================================== - Hits 65266 65020 -246 - Misses 20003 20115 +112 ``` [see 35 files with indirect coverage changes](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3184/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc)

🚨 Try these New Features:

raunakab commented 1 week ago

Example: https://github.com/Eventual-Inc/Daft/actions/runs/11926775586

Run by @desmondcheongzx. Run was submitted locally using the gh CLI tool. Invocation was:

gh workflow run build-commit-run-tpch.yaml --ref $BRANCH_NAME -f skip_questions=$SKIP_QUESTIONS
raunakab commented 1 week ago

Tagging @colin-ho. You recently touched the benchmarking/tpch/__main__.py file. Just wanted to run some of those changes by you first.