apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.48k stars 3.52k forks source link

[C++] Writing IPC messages with 64-byte buffer alignment vs. 8-byte default #25151

Open asfimport opened 4 years ago

asfimport commented 4 years ago

I used the C++ library to create a very small arrow file (1 field of 5 int32) and was surprised that the buffers are not aligned to 64 bytes as per the documentation section "Buffer Alignment and Padding" with examples.. based on the examples there, the 20 bytes of int32 should be padded to 64 bytes, but it is only 24 (see below) .   

extract message metadata - see BodyLength = 24


{
  version: "V4",
  header_type: "RecordBatch",
  header: {
    nodes: [
      {
        length: 5,
        null_count: 0
      }
    ],
    buffers: [
      {
        offset: 0,
        length: 0
      },
      {
        offset: 0,
        length: 20
      }
    ]
  },
  bodyLength: 24
} 

Reading further down the documentation section "Encapsulated message format" it says serialization should use 8 byte alignment. 

These both seem at odds with each other and some clarification is needed.

Is the documentation wrong? 

Or

Should 8 byte alignment be used for File and 64 byte for IPC ?

Reporter: Anthony Abate / @abbotware

Note: This issue was originally created as ARROW-9035. Please see the migration documentation for further details.

asfimport commented 4 years ago

Anthony Abate / @abbotware: Perhaps in RFC terms (https://tools.ietf.org/html/rfc2119) the doc should say:

All buffers (metadata (flatbuffers) and data buffers) MUST be 8 byte aligned but SHOULD be 64 byte aligned - This would apply to both sections.

With most of the docs going stressing 64 byte alignment, I didn't realize the 'default' alignment the C++ library is 8 bytes.. assumed it would be 64 byte.

 

 

asfimport commented 4 years ago

Wes McKinney / @wesm: It's supposed to be configurable

https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/options.h#L47

I don't know if this configuration parameter is respected though. It's a big project and there are only so many helping hands.

asfimport commented 4 years ago

Anthony Abate / @abbotware: yes - I didn't realize it was configurable - it probably works (but i'll know soon if it doesnt)  

I thought the docs sections were in conflict - but now I realize that 8 byte alignment is the 'requirement' not 64..  (64 is still a multiple of 8)