When trying to implement CRC computation in Parquet C++, we found the wording to be ambiguous.
Clarify that CRC computation happens on the exact binary serialization (instead of a long-winded and confusing elaboration about v1 and v2 data page layout).
Also, clarify that CRC computation can apply to all page kinds, not only data pages (for reference, parquet-mr currently support checksumming v1 data pages as well as dictionary pages).
When trying to implement CRC computation in Parquet C++, we found the wording to be ambiguous.
Clarify that CRC computation happens on the exact binary serialization (instead of a long-winded and confusing elaboration about v1 and v2 data page layout).
Also, clarify that CRC computation can apply to all page kinds, not only data pages (for reference, parquet-mr currently support checksumming v1 data pages as well as dictionary pages).
Also, see discussion on https://github.com/apache/parquet-format/pull/126#issuecomment-1348081137 and below.