Eventual-Inc / Daft

Distributed data engine for Python/SQL designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
2.34k stars 164 forks source link

[CHORE] Write tpch parquet files one at a time #3396

Open colin-ho opened 11 hours ago

colin-ho commented 11 hours ago

When you specify a num_parts parameter when generating tpch files. It will first generate num_parts CSVs, then read those CSVs and write to parquet using Daft.

However, write_parquet will not respect the input number of files, e.g. even if there are 16 input files there might only be 1 output file.

The fix here is to read and write 1 file at a time.

codspeed-hq[bot] commented 11 hours ago

CodSpeed Performance Report

Merging #3396 will not alter performance

Comparing colin/gen-parquet (85fd788) with main (ec39dc0)

Summary

✅ 17 untouched benchmarks

codecov[bot] commented 11 hours ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 77.35%. Comparing base (3394a66) to head (85fd788). Report is 4 commits behind head on main.

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3396/graphs/tree.svg?width=650&height=150&src=pr&token=J430QVFE89&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc)](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3396?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc) ```diff @@ Coverage Diff @@ ## main #3396 +/- ## ========================================== - Coverage 77.35% 77.35% -0.01% ========================================== Files 685 685 Lines 83631 83637 +6 ========================================== + Hits 64695 64697 +2 - Misses 18936 18940 +4 ``` [see 6 files with indirect coverage changes](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3396/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc)

🚨 Try these New Features:

graphite-app[bot] commented 11 hours ago

Graphite Automations

"Request reviewers once CI passes" took an action on this PR • (11/21/24)

1 reviewer was added to this PR based on Andrew Gazelka's automation.