dedupeio / dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
https://docs.dedupe.io
MIT License
4.14k stars 549 forks source link

How do you make a gazetteer when individuals have multiple addresses? #398

Closed lminer closed 9 years ago

lminer commented 9 years ago

Reading the documentation, it seems like a gazetteer needs to have clean, distinct individual-level data. What do you do if the individual has moved, changed jobs, etc a bunch of times? Include multiple observations per individual with the blanks intelligently filled in?

fgregg commented 9 years ago

These kinds of questions are better on the mailing list of stack overflow. Thanks!

lminer commented 9 years ago

Is there a specific tag that you use?

fgregg commented 9 years ago

not yet. dedupe may be good. we keep an eye out :)

lminer commented 9 years ago

Done, although SO people don't seem to like it: https://stackoverflow.com/questions/31324582/how-do-you-make-a-gazetteer-for-dedupe-when-individuals-have-multiple-addresses

fgregg commented 9 years ago

python-dedupe may be more informative going forward. We'll see.

On Thu, Jul 9, 2015 at 1:04 PM Luke Miner notifications@github.com wrote:

Done, although SO people don't seem to like it: https://stackoverflow.com/questions/31324582/how-do-you-make-a-gazetteer-for-dedupe-when-individuals-have-multiple-addresses

— Reply to this email directly or view it on GitHub https://github.com/datamade/dedupe/issues/398#issuecomment-120088948.