bitsondatadev / trino-getting-started

Apache License 2.0
250 stars 100 forks source link

Add date column to drivestats table #11

Closed metadaddy closed 1 year ago

metadaddy commented 1 year ago

We added a date column to the drivestats table to allow for easier querying - it's much easier to, for example, find the most recent entry in the data set than it was using the year, month and day columns.

metadaddy commented 1 year ago

Hi @findinpath,

The queries in the README don't go into too much detail, but in writing this blog post on querying the Drive Stats data using Trino, I had to write queries such as:

SELECT serial_number, MAX(DATE(FORMAT('%04d-%02d-%02d', 
year, month, day))) AS date
    FROM drivestats 
    GROUP BY serial_number;

I realized that adding the date column to the schema, alongside the year/month/day we originally specified for the sake of partitioning, would make this sort of query much more straightforward. Of course, this should all go away if/when we migrate from Hive to Iceberg.

I've regenerated the Parquet data in B2 with the date column, so I'd like the schema here to match it.