Rethink how we persist historical data of scheduled benchmarking runs

hendrikmakait commented 3 months ago

Right now, we store all the data in two tables test_run and tpch_run. These tables are highly denormalized, i.e., flattened out so that they don't need to be merged with anything. This has caused our database to blow up significantly. To reduce the size of our historical database, we should think about normalizing some of the data, e.g., run or cluster data. Moreover, we don't use all the columns for TPC-H data which further increases the database size and stops us from truncating historical data as we don't store the start or end time of runs.

We should rethink this to avoid future problems caused by too much data like the recent 2 month gap in persisted history caused by CI workers running OOM.

shughes-uk commented 3 weeks ago

Would be nice if y'all plotted it as monthly/quarterly rollups or something as a one off. I'm sad I can't go find the historical stuff!

shughes-uk commented 3 weeks ago

Nevermind, it looks like y'all have erased them permanently 😢

coiled / benchmarks

Rethink how we persist historical data of scheduled benchmarking runs #1524