The local CSV reader currently makes upfront buffer allocations (80 MiB for file slabs and 80 MiB for CSV buffers). This unnecessarily blows up the read time for small CSV files which don't use so many buffers.
Since the local CSV reader allocates additional buffers as needed, we can remove all upfront allocations without affecting anything else in the implementation of the reader. This speeds up reads of small files.
At the same time, I benchmarked the performance of the reader against the test case described in https://github.com/Eventual-Inc/Daft/pull/3055 and found no consistent slowdown without upfront comparisons.
The local CSV reader currently makes upfront buffer allocations (80 MiB for file slabs and 80 MiB for CSV buffers). This unnecessarily blows up the read time for small CSV files which don't use so many buffers.
Since the local CSV reader allocates additional buffers as needed, we can remove all upfront allocations without affecting anything else in the implementation of the reader. This speeds up reads of small files.
At the same time, I benchmarked the performance of the reader against the test case described in https://github.com/Eventual-Inc/Daft/pull/3055 and found no consistent slowdown without upfront comparisons.