apache / orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
https://orc.apache.org/
Apache License 2.0
692 stars 483 forks source link

Example files are using legacy timezone names (US/Pacific) #2049

Open bdice opened 1 month ago

bdice commented 1 month ago

The example ORC files use a timezone of US/Pacific which is no longer included in all Linux distributions. Ubuntu 24.04, for example, has moved this to a separate tzdata-legacy package. This can cause issues for ORC file readers on systems missing that legacy time zone data.

Should the example ORC files be updated to use a more current time zone name, like America/Los_Angeles?

Verifying the time zone in the stripe footers:

wget https://github.com/apache/orc/raw/refs/heads/main/examples/TestOrcFile.testDate1900.orc
orc-metadata -v TestOrcFile.testDate1900.orc
# Shows stripe footers with "timezone": "US/Pacific"

Additional context

https://bugs.launchpad.net/ubuntu/+source/tzdata/+bug/2058249 https://github.com/apache/arrow/issues/40633 https://github.com/pandas-dev/pandas/issues/56292 https://github.com/rapidsai/cudf/pull/16998#issuecomment-2400980607

dongjoon-hyun commented 1 month ago

Thank you for reporting, @bdice .

cc @williamhyun , @wgtmac , too.

dongjoon-hyun commented 1 month ago

To @bdice , according to our official Java tool, the type of column time is timestamp without timezone, isn't it?

$ orc-tools version
ORC 2.0.2

$ orc-tools meta ./examples/TestOrcFile.testDate1900.orc | grep Type
Processing data file examples/TestOrcFile.testDate1900.orc [length: 30941]
Type: struct<time:timestamp,date:date>

Please see here. Given that there is no timezone, I'm not sure if the root cause is the file.

Instead, it looks like the C++ library side issue because orc-metadata is based on C++ library. BTW, ORC-1481 was fixed already at Apache ORC 2.0.0. Do you mean that you hit this issue with Apache ORC 2.0+?

wgtmac commented 1 month ago

It looks like a breaking change of timezone name from TZDB. I will take a look. cc @ffacs

dongjoon-hyun commented 1 month ago

Thank you so much, @wgtmac .

wgtmac commented 4 weeks ago

https://bugs.launchpad.net/ubuntu/+source/tzdata/+bug/2058249 has explained the root cause that tzdata has moved timezone files like US/Pacific to a separate tzdata-legacy library without providing symlinks by intention so it is a breaking change to legacy ORC files. At the same time, some downstream projects depending on Apache ORC C++ library uses ORC files from https://github.com/apache/orc/tree/main/examples for CI validation. These CI jobs start to fail once they upgrade to Ubuntu 24.04 which uses the new version of tzdata without tzdata-legacy installed.

IMO, we should not change TestOrcFile.testDate1900.orc as it is a good example to check if tzdata-legacy is required. One thing that I don't understand is that we have CI jobs running on Ubuntu 24.4 but they do not fail.

bdice commented 3 weeks ago

IMO, we should not change TestOrcFile.testDate1900.orc as it is a good example to check if tzdata-legacy is required.

That is fine with me! I have worked around this by installing tzdata-legacy on Ubuntu 24.04. I can see the potential value here. I am okay with closing this issue with no action, if that is acceptable to others.

Another possible course of action would be to leave TestOrcFile.testDate1900.orc as-is, and update the timezone names in TestOrcFile.testDate2038.orc (currently also using US/Pacific).

2038 test file output Using `orc` 2.0.2: ```bash $ orc-metadata -v TestOrcFile.testDate2038.orc { "name": "TestOrcFile.testDate2038.orc", "type": "struct", "attributes": {}, "rows": 212000, "stripe count": 28, "format": "0.12", "writer version": "HIVE-8732", "software version": "ORC Java", "compression": "zlib", "compression block": 10000, "file length": 95787, "content": 94762, "stripe stats": 686, "footer": 314, "postscript": 24, "row index stride": 10000, "user metadata": { }, "stripes": [ { "stripe": 0, "rows": 15000, "offset": 3, "length": 6410, "index": 153, "data": 6194, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 3, "length": 21 }, { "id": 1, "column": 1, "kind": "index", "offset": 24, "length": 78 }, { "id": 2, "column": 2, "kind": "index", "offset": 102, "length": 54 }, { "id": 3, "column": 1, "kind": "data", "offset": 156, "length": 507 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 663, "length": 5416 }, { "id": 5, "column": 2, "kind": "data", "offset": 6079, "length": 271 } ], "timezone": "US/Pacific" }, { "stripe": 1, "rows": 5000, "offset": 6413, "length": 2214, "index": 76, "data": 2075, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 6413, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 6425, "length": 37 }, { "id": 2, "column": 2, "kind": "index", "offset": 6462, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 6489, "length": 171 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 6660, "length": 1803 }, { "id": 5, "column": 2, "kind": "data", "offset": 8463, "length": 101 } ], "timezone": "US/Pacific" }, { "stripe": 2, "rows": 10000, "offset": 8627, "length": 4321, "index": 76, "data": 4182, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 8627, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 8639, "length": 37 }, { "id": 2, "column": 2, "kind": "index", "offset": 8676, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 8703, "length": 340 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 9043, "length": 3608 }, { "id": 5, "column": 2, "kind": "data", "offset": 12651, "length": 234 } ], "timezone": "US/Pacific" }, { "stripe": 3, "rows": 10000, "offset": 12948, "length": 4326, "index": 77, "data": 4186, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 12948, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 12960, "length": 38 }, { "id": 2, "column": 2, "kind": "index", "offset": 12998, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 13025, "length": 341 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 13366, "length": 3608 }, { "id": 5, "column": 2, "kind": "data", "offset": 16974, "length": 237 } ], "timezone": "US/Pacific" }, { "stripe": 4, "rows": 5000, "offset": 17274, "length": 2229, "index": 76, "data": 2090, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 17274, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 17286, "length": 37 }, { "id": 2, "column": 2, "kind": "index", "offset": 17323, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 17350, "length": 174 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 17524, "length": 1803 }, { "id": 5, "column": 2, "kind": "data", "offset": 19327, "length": 113 } ], "timezone": "US/Pacific" }, { "stripe": 5, "rows": 10000, "offset": 19503, "length": 4401, "index": 77, "data": 4261, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 19503, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 19515, "length": 38 }, { "id": 2, "column": 2, "kind": "index", "offset": 19553, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 19580, "length": 416 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 19996, "length": 3608 }, { "id": 5, "column": 2, "kind": "data", "offset": 23604, "length": 237 } ], "timezone": "US/Pacific" }, { "stripe": 6, "rows": 5000, "offset": 23904, "length": 2268, "index": 76, "data": 2129, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 23904, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 23916, "length": 37 }, { "id": 2, "column": 2, "kind": "index", "offset": 23953, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 23980, "length": 210 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 24190, "length": 1803 }, { "id": 5, "column": 2, "kind": "data", "offset": 25993, "length": 116 } ], "timezone": "US/Pacific" }, { "stripe": 7, "rows": 10000, "offset": 26172, "length": 4397, "index": 77, "data": 4257, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 26172, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 26184, "length": 38 }, { "id": 2, "column": 2, "kind": "index", "offset": 26222, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 26249, "length": 419 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 26668, "length": 3608 }, { "id": 5, "column": 2, "kind": "data", "offset": 30276, "length": 230 } ], "timezone": "US/Pacific" }, { "stripe": 8, "rows": 5000, "offset": 30569, "length": 2269, "index": 76, "data": 2130, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 30569, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 30581, "length": 37 }, { "id": 2, "column": 2, "kind": "index", "offset": 30618, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 30645, "length": 213 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 30858, "length": 1803 }, { "id": 5, "column": 2, "kind": "data", "offset": 32661, "length": 114 } ], "timezone": "US/Pacific" }, { "stripe": 9, "rows": 10000, "offset": 32838, "length": 4390, "index": 77, "data": 4250, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 32838, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 32850, "length": 38 }, { "id": 2, "column": 2, "kind": "index", "offset": 32888, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 32915, "length": 411 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 33326, "length": 3608 }, { "id": 5, "column": 2, "kind": "data", "offset": 36934, "length": 231 } ], "timezone": "US/Pacific" }, { "stripe": 10, "rows": 5000, "offset": 37228, "length": 2268, "index": 76, "data": 2129, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 37228, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 37240, "length": 37 }, { "id": 2, "column": 2, "kind": "index", "offset": 37277, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 37304, "length": 211 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 37515, "length": 1803 }, { "id": 5, "column": 2, "kind": "data", "offset": 39318, "length": 115 } ], "timezone": "US/Pacific" }, { "stripe": 11, "rows": 10000, "offset": 39496, "length": 4399, "index": 77, "data": 4259, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 39496, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 39508, "length": 38 }, { "id": 2, "column": 2, "kind": "index", "offset": 39546, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 39573, "length": 414 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 39987, "length": 3608 }, { "id": 5, "column": 2, "kind": "data", "offset": 43595, "length": 237 } ], "timezone": "US/Pacific" }, { "stripe": 12, "rows": 5000, "offset": 43895, "length": 2266, "index": 76, "data": 2127, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 43895, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 43907, "length": 37 }, { "id": 2, "column": 2, "kind": "index", "offset": 43944, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 43971, "length": 211 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 44182, "length": 1803 }, { "id": 5, "column": 2, "kind": "data", "offset": 45985, "length": 113 } ], "timezone": "US/Pacific" }, { "stripe": 13, "rows": 10000, "offset": 46161, "length": 4395, "index": 77, "data": 4255, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 46161, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 46173, "length": 38 }, { "id": 2, "column": 2, "kind": "index", "offset": 46211, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 46238, "length": 412 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 46650, "length": 3608 }, { "id": 5, "column": 2, "kind": "data", "offset": 50258, "length": 235 } ], "timezone": "US/Pacific" }, { "stripe": 14, "rows": 5000, "offset": 50556, "length": 2267, "index": 76, "data": 2128, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 50556, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 50568, "length": 37 }, { "id": 2, "column": 2, "kind": "index", "offset": 50605, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 50632, "length": 211 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 50843, "length": 1803 }, { "id": 5, "column": 2, "kind": "data", "offset": 52646, "length": 114 } ], "timezone": "US/Pacific" }, { "stripe": 15, "rows": 10000, "offset": 52823, "length": 4401, "index": 77, "data": 4261, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 52823, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 52835, "length": 38 }, { "id": 2, "column": 2, "kind": "index", "offset": 52873, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 52900, "length": 414 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 53314, "length": 3608 }, { "id": 5, "column": 2, "kind": "data", "offset": 56922, "length": 239 } ], "timezone": "US/Pacific" }, { "stripe": 16, "rows": 5000, "offset": 57224, "length": 2272, "index": 76, "data": 2133, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 57224, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 57236, "length": 37 }, { "id": 2, "column": 2, "kind": "index", "offset": 57273, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 57300, "length": 211 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 57511, "length": 1803 }, { "id": 5, "column": 2, "kind": "data", "offset": 59314, "length": 119 } ], "timezone": "US/Pacific" }, { "stripe": 17, "rows": 10000, "offset": 59496, "length": 4396, "index": 76, "data": 4257, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 59496, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 59508, "length": 37 }, { "id": 2, "column": 2, "kind": "index", "offset": 59545, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 59572, "length": 414 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 59986, "length": 3608 }, { "id": 5, "column": 2, "kind": "data", "offset": 63594, "length": 235 } ], "timezone": "US/Pacific" }, { "stripe": 18, "rows": 10000, "offset": 63892, "length": 4399, "index": 77, "data": 4259, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 63892, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 63904, "length": 38 }, { "id": 2, "column": 2, "kind": "index", "offset": 63942, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 63969, "length": 416 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 64385, "length": 3608 }, { "id": 5, "column": 2, "kind": "data", "offset": 67993, "length": 235 } ], "timezone": "US/Pacific" }, { "stripe": 19, "rows": 5000, "offset": 68291, "length": 2265, "index": 76, "data": 2126, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 68291, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 68303, "length": 37 }, { "id": 2, "column": 2, "kind": "index", "offset": 68340, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 68367, "length": 210 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 68577, "length": 1803 }, { "id": 5, "column": 2, "kind": "data", "offset": 70380, "length": 113 } ], "timezone": "US/Pacific" }, { "stripe": 20, "rows": 10000, "offset": 70556, "length": 4398, "index": 77, "data": 4258, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 70556, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 70568, "length": 38 }, { "id": 2, "column": 2, "kind": "index", "offset": 70606, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 70633, "length": 413 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 71046, "length": 3608 }, { "id": 5, "column": 2, "kind": "data", "offset": 74654, "length": 237 } ], "timezone": "US/Pacific" }, { "stripe": 21, "rows": 5000, "offset": 74954, "length": 2263, "index": 76, "data": 2124, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 74954, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 74966, "length": 37 }, { "id": 2, "column": 2, "kind": "index", "offset": 75003, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 75030, "length": 206 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 75236, "length": 1803 }, { "id": 5, "column": 2, "kind": "data", "offset": 77039, "length": 115 } ], "timezone": "US/Pacific" }, { "stripe": 22, "rows": 10000, "offset": 77217, "length": 4403, "index": 77, "data": 4263, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 77217, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 77229, "length": 38 }, { "id": 2, "column": 2, "kind": "index", "offset": 77267, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 77294, "length": 417 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 77711, "length": 3608 }, { "id": 5, "column": 2, "kind": "data", "offset": 81319, "length": 238 } ], "timezone": "US/Pacific" }, { "stripe": 23, "rows": 5000, "offset": 81620, "length": 2266, "index": 77, "data": 2126, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 81620, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 81632, "length": 38 }, { "id": 2, "column": 2, "kind": "index", "offset": 81670, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 81697, "length": 207 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 81904, "length": 1803 }, { "id": 5, "column": 2, "kind": "data", "offset": 83707, "length": 116 } ], "timezone": "US/Pacific" }, { "stripe": 24, "rows": 5000, "offset": 83886, "length": 2267, "index": 77, "data": 2127, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 83886, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 83898, "length": 38 }, { "id": 2, "column": 2, "kind": "index", "offset": 83936, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 83963, "length": 213 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 84176, "length": 1803 }, { "id": 5, "column": 2, "kind": "data", "offset": 85979, "length": 111 } ], "timezone": "US/Pacific" }, { "stripe": 25, "rows": 5000, "offset": 86153, "length": 2265, "index": 76, "data": 2126, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 86153, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 86165, "length": 37 }, { "id": 2, "column": 2, "kind": "index", "offset": 86202, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 86229, "length": 211 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 86440, "length": 1803 }, { "id": 5, "column": 2, "kind": "data", "offset": 88243, "length": 112 } ], "timezone": "US/Pacific" }, { "stripe": 26, "rows": 10000, "offset": 88418, "length": 4399, "index": 77, "data": 4259, "footer": 63, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 88418, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 88430, "length": 38 }, { "id": 2, "column": 2, "kind": "index", "offset": 88468, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 88495, "length": 414 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 88909, "length": 3608 }, { "id": 5, "column": 2, "kind": "data", "offset": 92517, "length": 237 } ], "timezone": "US/Pacific" }, { "stripe": 27, "rows": 2000, "offset": 92817, "length": 1945, "index": 76, "data": 1808, "footer": 61, "encodings": [ { "column": 0, "encoding": "direct" }, { "column": 1, "encoding": "direct rle2" }, { "column": 2, "encoding": "direct rle2" } ], "streams": [ { "id": 0, "column": 0, "kind": "index", "offset": 92817, "length": 12 }, { "id": 1, "column": 1, "kind": "index", "offset": 92829, "length": 37 }, { "id": 2, "column": 2, "kind": "index", "offset": 92866, "length": 27 }, { "id": 3, "column": 1, "kind": "data", "offset": 92893, "length": 89 }, { "id": 4, "column": 1, "kind": "secondary", "offset": 92982, "length": 1661 }, { "id": 5, "column": 2, "kind": "data", "offset": 94643, "length": 58 } ], "timezone": "US/Pacific" } ] } ```
wgtmac commented 3 weeks ago

@bdice I think we can keep those files are they are created by legacy writers: "format": "0.12", "writer version": "HIVE-8732", "software version": "ORC Java". We can use the latest writer to generate new file with equivalent data but with new timezone names.