dedupeio / dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
https://docs.dedupe.io
MIT License
4.15k stars 551 forks source link

consider amortized costs for branch and bound heuristics #1176

Open fgregg opened 11 months ago

fgregg commented 11 months ago

$c_{\text{predicate}} = \frac{\sum_i \frac{1}{|\text{predicates covering coreferent pair}_i|}}{\sum_j \frac{1}{|\text{predicates covering distinct pair}_j|}}$