dedupeio / dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
https://docs.dedupe.io
MIT License
4.14k stars 549 forks source link

RNN for deduplicating #505

Open fgregg opened 7 years ago

fgregg commented 7 years ago

This has some very interesting implications for deduping. https://github.com/MajorTal/DeepSpell/blob/master/keras_spell.py

attn @mccc

fgregg commented 6 years ago

yay! https://github.com/iesl/learned-string-alignments

fgregg commented 2 years ago

siamese models: https://medium.com/peak-product/towards-reusable-entity-resolution-eed1c6ee4a14

fgregg commented 2 years ago

https://medium.com/@gerrit.anders/accelerate-through-matched-data-42d4a11d6d4d

fgregg commented 2 years ago

https://github.com/megagonlabs/ditto https://github.com/anhaidgroup/deepmatcher