Currently the individual census tables are filtered through the used of needed datasets and a corresponding partition.
As begun in #92 (see this section) the config for derived columns can be expanded to include:
Geography level
Aggregation column (IMO just define a (DF -> DF) function that gets called to generate the new statistic)
To enable the above, the type for derivation config (currently: dict[str, tuple[str, list[DerivedColumn]]]) can be updated to include the extra required items.
This could be something like:
# One per derived table
class DerivedColumn:
hxltag: str
aggregation_func: Callable[[pd.DataFrame], pd.DataFrame]
output_column_name: str
human_readable_name: str
# One per source table
class MetricDerivationInstructions:
geography_level: str
geo_id_col_name: str
derived_columns: list[DerivedColumn]
Also see if needed_datasets + source_metrics assets can be skipped entirely.
Following any refactoring this pattern should be readily applicable to other countries to be updated in the pipeline (e.g. Scotland, NI, England/Wales, USA) new countries being added that conform to this DAG pattern for how the data is provided.
The original aim of issue is superseded in porting Northern Ireland #98. Consider whether to keep open for incorporating all other census tables as metrics (@andrewphilipsmith for reference)
From discussion with @yongrenjie as part of #92
Currently the individual census tables are filtered through the used of needed datasets and a corresponding partition.
As begun in #92 (see this section) the config for derived columns can be expanded to include:
To enable the above, the type for derivation config (currently:
dict[str, tuple[str, list[DerivedColumn]]]
) can be updated to include the extra required items.This could be something like:
Also see if
needed_datasets
+source_metrics
assets can be skipped entirely.Following any refactoring this pattern should be readily applicable to other countries to be updated in the pipeline (e.g. Scotland, NI, England/Wales, USA) new countries being added that conform to this DAG pattern for how the data is provided.