Right now, we store all the data in two tables test_run and tpch_run. These tables are highly denormalized, i.e., flattened out so that they don't need to be merged with anything. This has caused our database to blow up significantly. To reduce the size of our historical database, we should think about normalizing some of the data, e.g., run or cluster data. Moreover, we don't use all the columns for TPC-H data which further increases the database size and stops us from truncating historical data as we don't store the start or end time of runs.
We should rethink this to avoid future problems caused by too much data like the recent 2 month gap in persisted history caused by CI workers running OOM.
Right now, we store all the data in two tables
test_run
andtpch_run
. These tables are highly denormalized, i.e., flattened out so that they don't need to be merged with anything. This has caused our database to blow up significantly. To reduce the size of our historical database, we should think about normalizing some of the data, e.g., run or cluster data. Moreover, we don't use all the columns for TPC-H data which further increases the database size and stops us from truncating historical data as we don't store the start or end time of runs.We should rethink this to avoid future problems caused by too much data like the recent 2 month gap in persisted history caused by CI workers running OOM.