When calculating incidence on large cohort sizes (>10M people), the brute-force approach where the individual persons in the dataset are SUM()....GROUP BY leads to an aggregation of 10M Records per TAR, per Outcome (ie: rows number of TARs number of Outcomes). Then, depending on the strata settings, this Very Large Dataset is scanned multiple times to do the different age, sex, year stratification, leading to additional sweeps on this intermediate table.
To optimize this, the 'baseline' Time-at-Risk can be aggregated up front, and then only those persons with cases can be aggregated to calculate exclusion time that then feed back into the TAR aggregate to create the completed #incidence_subgroup table.
When calculating incidence on large cohort sizes (>10M people), the brute-force approach where the individual persons in the dataset are
SUM()....GROUP BY
leads to an aggregation of 10M Records per TAR, per Outcome (ie: rows number of TARs number of Outcomes). Then, depending on the strata settings, this Very Large Dataset is scanned multiple times to do the different age, sex, year stratification, leading to additional sweeps on this intermediate table.To optimize this, the 'baseline' Time-at-Risk can be aggregated up front, and then only those persons with cases can be aggregated to calculate exclusion time that then feed back into the TAR aggregate to create the completed #incidence_subgroup table.