Cross-edition (or "memory") inference (ETL-level)

Depends on:

Make the system capable of inferring addresses from comparing different editions in time of the same dataset. This includes:

defining how all ETLs should behave when doing cross-edition inference, and
implementing the necessary supporting software (e.g. reusable libraries / infrastructure).

It is the current hypothesis that each ETL will be responsible for its own cross-edition inference, and not the distiller.

Ideally, the inference algorithms use the statistical confidence evaluation.

This card is not the actual implementation for existing ETLs. E.g. see https://github.com/OpenAddressesUK/roadmap/issues/18 for the Companies House ETL.

The solution should be capable of relying on past editions of the source data from OA's deep archive, when not available from the publisher any longer (see https://github.com/OpenAddressesUK/roadmap/issues/10).

As for other contexts where we're introducing features such as inference or statistical confidence calculation in Beta, we are not aiming at implementing all possible methodologies but just the ones where we can build the best business cases, e.g. for inference the best volume of addresses being inferred vs the cost ratio to produce them (where cost is development effort, cost of operations etc.).

OpenAddressesUK / roadmap

Cross-edition (or "memory") inference (ETL-level) #8