apache / incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
https://xtable.apache.org/
Apache License 2.0
919 stars 147 forks source link

Fix the edge case when handling non numeric values of double type in delta stats #526

Closed emilie-wang closed 2 months ago

emilie-wang commented 2 months ago

Important Read

What is the purpose of the pull request

The pr aims to fix: https://github.com/apache/incubator-xtable/issues/524. When reading the delta snapshot and load the information into Delta object AddFile, the non-numeric values of float or double type (example, "NaN", "-Infinity") from col stats become string type. These special values need special handling and this pr used the same idea how delta new API handled this: https://github.com/delta-io/delta/blob/master/kernel/kernel-defaults/src/main/java/io/delta/kernel/defaults/internal/data/DefaultJsonRow.java#L210

Brief change log

Verify this pull request

This change added tests and can be verified as follows:

emilie-wang commented 2 months ago

Hi, Can I have your help to rerun the Ci build pipeline. I couldn't replay the same error locally:

Error:  org.apache.xtable.hudi.extensions.TestAddFieldIdsClientInitCallback.existingTable -- Time elapsed: 0.219 s <<< ERROR!
java.lang.NullPointerException
the-other-tim-brown commented 2 months ago

Hi, Can I have your help to rerun the Ci build pipeline. I couldn't replay the same error locally:

Error:  org.apache.xtable.hudi.extensions.TestAddFieldIdsClientInitCallback.existingTable -- Time elapsed: 0.219 s <<< ERROR!
java.lang.NullPointerException

@emilie-wang I re-ran it for you and everything is passing. Just had some minor comments and questions. Thank you for the quick fix!

the-other-tim-brown commented 2 months ago

@emilie-wang one final request is for you to squash down to a single commit for me so I can approve and merge this into the main branch. Thanks again for the contribution!

emilie-wang commented 2 months ago

Hi @the-other-tim-brown, I am back from vacation and just updated my PR based on your comments. Thank you for your review!

the-other-tim-brown commented 2 months ago

Hi @the-other-tim-brown, I am back from vacation and just updated my PR based on your comments. Thank you for your review!

@emilie-wang hope you had a good vacation! Can you squash this all down to 1 commit for me? I've used a command like git reset --soft $(git merge-base main HEAD) in the past to do this

emilie-wang commented 2 months ago

@the-other-tim-brown sorry I forgot this mesage and just squashed down to 1 single commit. Thank you for the quick review again.