dmlc / dmlc-core

A common bricks library for building scalable and portable distributed machine learning.
Apache License 2.0
864 stars 520 forks source link

Revert "Simplify Endian detection logic (#541)" #543

Closed hcho3 closed 5 years ago

hcho3 commented 5 years ago

Unfortunately, #541 does not actually work on big-endian machines. I've tried it on an emulated SPARC machine (emulated with QEMU) and __dmlc_is_system_little_endian() still returned 1.

The DMLC serializer needs a compile-time flag for endianness, so running endian detection at runtime (via pointer casting) is not an option. So for now, I'm reverting #541 with some refinements.

cc @rongou

nhynes commented 5 years ago

Do you think that you could add your emulator to CI (without undue burden)? It'd be unfortunate if this happened again in four months after people forget that this is an issue :)

hcho3 commented 5 years ago

@nhynes I'm not sure if CI would help, because the DMLC serializer would have silently stored artifacts in the wrong byte order, but round-trip tests won't catch this problem (deserialization would also occur in the wrong byte order).

hcho3 commented 5 years ago

Hmm, actually maybe we should add a test for the macro DMLC_LITTLE_ENDIAN, and then add a big-endian target in the CI. Let me look.

hcho3 commented 5 years ago

@nhynes I managed to get Docker working with S390X emulation: https://github.com/hcho3/docker-s390x-travis. Notice that the build log prints out "Big Endian" at the end. I will add it to dmlc-core's CI soon.

trivialfis commented 5 years ago

Before we revert that, are you sure there's no simple fix? I like your cleaned up code.

hcho3 commented 5 years ago

@trivialfis (const uint8_t&)0xAABBCCDD) always evaluates to 0xDD, because the constant 0xAABBCCDD simply gets truncated down to 8-bits. So it actually doesn't detect the endianness of the machine.

I searched the Internet for a while and I couldn't really find a way to detect endian at compile-time (not run-time). In particular, pointer casting (*(const uint8_t *)&value) is illegal in a constexpr. We need compile-time detection, because the DMLC serializer uses template to choose the correct serialization logic for each data type, and we need the endian flag as a template argument.

I considered revising the DMLC serializer, so that DMLC_LITTLE_ENDIAN wouldn't need to be compile-time constant. I decided against it because then unsupported types will crash the serializer at runtime rather than generating a compilation error.

hcho3 commented 5 years ago

@trivialfis I don't think we are considering cross compilation at this point in time. Let us come back to it later. My guess is that we'll need runtime endian detection if we are cross compiling.

hcho3 commented 5 years ago

@nhynes I will add CI for a big-endian target in a follow-up pull request. Thanks for your suggestion.