Optimize #incidence_subgroups creation

When calculating incidence on large cohort sizes (>10M people), the brute-force approach where the individual persons in the dataset are SUM()....GROUP BY leads to an aggregation of 10M Records per TAR, per Outcome (ie: rows number of TARs number of Outcomes). Then, depending on the strata settings, this Very Large Dataset is scanned multiple times to do the different age, sex, year stratification, leading to additional sweeps on this intermediate table.

To optimize this, the 'baseline' Time-at-Risk can be aggregated up front, and then only those persons with cases can be aggregated to calculate exclusion time that then feed back into the TAR aggregate to create the completed #incidence_subgroup table.

OHDSI / CohortIncidence

Optimize #incidence_subgroups creation #46