Open Jefffrey opened 3 months ago
FYI I've raised a PR to update ORC spec for timestamp: https://github.com/apache/orc/pull/1867
Since that was what caused initial confusion for me, as some key implementation details were not mentioned in the spec leading to the initial inaccurate implementation
Also worth noting how arrow deals with this:
Note that the 2038 test is dependent on https://github.com/chronotope/chrono-tz/issues/155 as it tests timestamps 2100 and beyond, which chrono_tz currently has inaccurate DST for
Partially resolved by https://github.com/datafusion-contrib/datafusion-orc/commit/60288cdee57f72dec84c3fd9f6085561568aad49
Still pending chrono_tz fix for large date DST
ORC timestamp type is not straightforward, as though it apparently represents a timestamp without a timezone, its encoding & decoding is still dependent on the writer timezone (encoded in the stripe) and the reader timezone.
This is why the 1900 & 2038 integration tests are currently disabled as our implementation is incorrect:
https://github.com/datafusion-contrib/datafusion-orc/blob/3369b5d7bb110dafde64288416c86bc8c996e465/tests/integration/main.rs#L155-L165