apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.63k stars 3.56k forks source link

GH-44767: [C++] Fix Float16.To{Little,Big}Endian on big endian machines #44768

Closed QuLogic closed 3 days ago

QuLogic commented 3 days ago

Rationale for this change

See issue.

What changes are included in this PR?

For ToLittleEndian/ToBigEndian, the result should always be in the specified endianness, not depend on host order.

In the test, instead of casting the uint8_t data into a uint16_t (with unspecified endianness handling), compare the bytes directly in their expected orders.

Are these changes tested?

Tested on little-endian, still building for big-endian.

Are there any user-facing changes?

Fixes #44767

github-actions[bot] commented 3 days ago

:warning: GitHub issue #44767 has been automatically assigned in GitHub to PR creator.

QuLogic commented 3 days ago

Confirmed to pass on a big-endian machine as well.

conbench-apache-arrow[bot] commented 3 days ago

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 59decc321953e8072c2a48c4c085d0fd1ceba774.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

QuLogic commented 2 days ago

For the record, did you try running the entire test suite on a big-endian machine?

Yes, I've run them all, but there are still several failures to go:

90% tests passed, 9 tests failed out of 89

The following tests FAILED:
     29 - arrow-compute-aggregate-test (Failed)
     68 - arrow-dataset-file-parquet-test (Failed)
     73 - arrow-flight-test (Failed)
     76 - arrow-ipc-read-write-test (Failed)
     81 - parquet-internals-test (Failed)
     82 - parquet-reader-test (Failed)
     84 - parquet-arrow-test (Failed)
     85 - parquet-arrow-internals-test (Failed)
     87 - parquet-encryption-key-management-test (Failed)
pitrou commented 2 days ago

@QuLogic Thanks. Parquet failures are expected unfortunately, as the work of making Parquet C++ big endian-compatible has not been done.

The IPC and compute failures are a bit more worrying.

For the record, are you working for a company that provides big endian systems?

QuLogic commented 2 days ago

Oops, I forgot to set the test environment variables; there's no arrow-flight-test failure:

91% tests passed, 8 tests failed out of 89

The following tests FAILED:
     29 - arrow-compute-aggregate-test (Failed)
     68 - arrow-dataset-file-parquet-test (Failed)
     69 - arrow-dataset-file-parquet-encryption-test (Failed)
     81 - parquet-internals-test (Failed)
     82 - parquet-reader-test (Failed)
     84 - parquet-arrow-test (Failed)
     85 - parquet-arrow-internals-test (Failed)
     87 - parquet-encryption-key-management-test (Failed)

For the record, are you working for a company that provides big endian systems?

No, I'm trying to fix geopandas on Fedora s390x.

pitrou commented 2 days ago

Could you run arrow-compute-aggregate-test in verbose mode and open a new issue with the failures perhaps?

QuLogic commented 2 days ago

Ah, those are https://github.com/apache/arrow/issues/12681#issuecomment-2485433056

pitrou commented 2 days ago

Ah, thanks. These are minor precision issues in the test, it seems.