optimize: speed up stat gen by factor x15

Made the stat generation faster using Rayon's thread pool.

Various improvements, such as;

slightly less copying
optimized MG
process in parallel using Rayon, taking advantage of fold/reduce
moved to Parquet (WIP to convert into Parquet automatically)
read Parquet in parallel (one reader for each row-group), this granularity is sufficient for big enough datasets

This takes 30s for JOB 1D stats on my computer, vs 7:30min before. Postgres takes 1:30min loading, and 22s for the stat gen. So we beat it, depending on how we view it.

On "real" datacenter hardware (i.e. 512 cores), we would crush it, we'll test that soon.

Finally coming together :-)

cmu-db / optd

optimize: speed up stat gen by factor x15 #167