Recent Parquet library versions (1.12.2) start to complain about the int96 timestamps:
$> parquet-cli cat -c fetch_time -n 5 s3a://commoncrawl/cc-index/table/cc-main/warc/crawl=CC-MAIN-2018-43/subset=warc/part-00247-f47c372a-e3d4-4f2b-b7a0-a939c04fd01e.c000.gz.parquet
Argument error: INT96 is deprecated. As interim enable READ_INT96_AS_FIXED flag to read as byte array.
See #7 and announcement of January 2020 crawl.
Recent Parquet library versions (1.12.2) start to complain about the int96 timestamps:
No complains for data from 2020 and newer:
Tasks: