apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
5.87k stars 2.06k forks source link

Performance optimization #9588

Open Fokko opened 5 months ago

Fokko commented 5 months ago

Apache Iceberg version

None

Query engine

None

Please describe the bug 🐞

Seeing if there is anything we can do to improve performance:

profile

liurenjie1024 commented 5 months ago

Seems a lot of time are spent on decompression.

Fokko commented 5 months ago

I also noticed that as well, while that's happening in C instead of Python.

Fokko commented 5 months ago

s3fs: output

PyArrow: output

Also, the PyArrow import is very slow :(

The GIL contention looks quite bad, but I'm running minio locally, so the IO is minimal