Closed metadaddy closed 1 year ago
Hi @findinpath,
The queries in the README don't go into too much detail, but in writing this blog post on querying the Drive Stats data using Trino, I had to write queries such as:
SELECT serial_number, MAX(DATE(FORMAT('%04d-%02d-%02d',
year, month, day))) AS date
FROM drivestats
GROUP BY serial_number;
I realized that adding the date column to the schema, alongside the year/month/day we originally specified for the sake of partitioning, would make this sort of query much more straightforward. Of course, this should all go away if/when we migrate from Hive to Iceberg.
I've regenerated the Parquet data in B2 with the date
column, so I'd like the schema here to match it.
We added a
date
column to the drivestats table to allow for easier querying - it's much easier to, for example, find the most recent entry in the data set than it was using theyear
,month
andday
columns.