Closed Fokko closed 2 weeks ago
make: *** [Makefile:55: test-integration] Aborted (core dumped)
uh oh
@kevinjqliu I think the test is a bit too much, according to your comment here https://github.com/apache/iceberg-python/pull/1539#discussion_r1922705843 the test allocates almost 5gb 😀
2^32 (4_294_967_296) is around 4GB, we just need to test a scenario greater than that
Second attempt of https://github.com/apache/iceberg-python/pull/1539
This was already being discussed back here: https://github.com/apache/iceberg-python/issues/208#issuecomment-1889891973
This PR changes from doing a sort, and then a single pass over the table to the approach where we determine the unique partition tuples filter on them individually.
Fixes https://github.com/apache/iceberg-python/issues/1491
Because the sort caused buffers to be joined where it would overflow in Arrow. I think this is an issue on the Arrow side, and it should automatically break up into smaller buffers. The
combine_chunks
method does this correctly.Now:
Before:
So it comes with a nice speedup as well :)