Eventual-Inc / Daft

Distributed data engine for Python/SQL designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
2.39k stars 170 forks source link

[PERF] Remove upfront buffer allocations for local CSV reader #3242

Closed desmondcheongzx closed 3 weeks ago

desmondcheongzx commented 3 weeks ago

The local CSV reader currently makes upfront buffer allocations (80 MiB for file slabs and 80 MiB for CSV buffers). This unnecessarily blows up the read time for small CSV files which don't use so many buffers.

Since the local CSV reader allocates additional buffers as needed, we can remove all upfront allocations without affecting anything else in the implementation of the reader. This speeds up reads of small files.

At the same time, I benchmarked the performance of the reader against the test case described in https://github.com/Eventual-Inc/Daft/pull/3055 and found no consistent slowdown without upfront comparisons.

codspeed-hq[bot] commented 3 weeks ago

CodSpeed Performance Report

Merging #3242 will not alter performance

Comparing desmondcheongzx:remove-upfront-csv-buffer-allocations (25cb584) with main (baca61e)

Summary

✅ 17 untouched benchmarks