apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.21k stars 2.17k forks source link

zorder does not work with sub fields #10017

Open cccs-jc opened 6 months ago

cccs-jc commented 6 months ago

Apache Iceberg version

1.4.2

Query engine

Spark

Please describe the bug 🐞

The rewrite_data_files with zorder does not work on sub-fields.

CALL users.system.rewrite_data_files(
        table => 'users.jcc.flow',
        options => map('max-concurrent-file-group-rewrites', '20',
                       'partial-progress.enabled', 'true',
                       'rewrite-all', 'true'),
        strategy => 'sort',
        sort_order => 'zorder(SRC_IP.v4, DST_IP.v4)',
        where => "END_TIME >= TIMESTAMP '2024-02-12'
            AND END_TIME < TIMESTAMP '2024-02-12' + INTERVAL 1 DAY"
        )

I get the error java.lang.IllegalArgumentException: SRC_IP.v4 does not exist

schema of the table is SRC_IP: struct<v4:bigint,v6:binary>, DST_IP: struct<v4:bigint,v6:binary>

singhpk234 commented 5 months ago

can this fit for your use case : https://github.com/apache/iceberg/pull/9818/files

cccs-jc commented 5 months ago

Seems like it would. I'm not a reviewer but I do want to the fix :-)

cccs-jc commented 3 months ago

The issue with nested fields for zorder still exists. Any chance you have time to complete the PR?

cccs-jc commented 2 months ago

@singhpk234 just following up on the lack of support for nested fields when applying zordering

cccs-jc commented 1 month ago

@RussellSpitzer do you think the PR mentioned by @singhpk234 https://github.com/apache/iceberg/pull/9818/files could get merged ?