coiled / benchmarks

BSD 3-Clause "New" or "Revised" License
28 stars 17 forks source link

Rethink how we persist historical data of scheduled benchmarking runs #1524

Open hendrikmakait opened 1 month ago

hendrikmakait commented 1 month ago

Right now, we store all the data in two tables test_run and tpch_run. These tables are highly denormalized, i.e., flattened out so that they don't need to be merged with anything. This has caused our database to blow up significantly. To reduce the size of our historical database, we should think about normalizing some of the data, e.g., run or cluster data. Moreover, we don't use all the columns for TPC-H data which further increases the database size and stops us from truncating historical data as we don't store the start or end time of runs.

We should rethink this to avoid future problems caused by too much data like the recent 2 month gap in persisted history caused by CI workers running OOM.