The solution should be capable of relying on past editions of the source data from OA's deep archive, when not available from the publisher any longer (see https://github.com/OpenAddressesUK/roadmap/issues/10).
As for other contexts where we're introducing features such as inference or statistical confidence calculation in Beta, we are not aiming at implementing all possible methodologies but just the ones where we can build the best business cases, e.g. for inference the best volume of addresses being inferred vs the cost ratio to produce them (where cost is development effort, cost of operations etc.).
All inference and confidence features were pushed back by one sprint after Fusion's lack of availability during early Jan. The natural target for this is now sprint #42 fyi @peterkwells
Depends on:
Make the system capable of inferring addresses from comparing different editions in time of the same dataset. This includes:
It is the current hypothesis that each ETL will be responsible for its own cross-edition inference, and not the distiller.
Ideally, the inference algorithms use the statistical confidence evaluation.
This card is not the actual implementation for existing ETLs. E.g. see https://github.com/OpenAddressesUK/roadmap/issues/18 for the Companies House ETL.
The solution should be capable of relying on past editions of the source data from OA's deep archive, when not available from the publisher any longer (see https://github.com/OpenAddressesUK/roadmap/issues/10).
As for other contexts where we're introducing features such as inference or statistical confidence calculation in Beta, we are not aiming at implementing all possible methodologies but just the ones where we can build the best business cases, e.g. for inference the best volume of addresses being inferred vs the cost ratio to produce them (where cost is development effort, cost of operations etc.).