18F / census-similarity

Small set of commands to find similarity between data sets
Other
1 stars 3 forks source link

Ideas for what the tool can do #11

Open EricSchles opened 8 years ago

EricSchles commented 8 years ago

Hey all,

Here are some ideas for the tool for textual data:

1) deduping text - names, places, entities:

2) deduping addresses and phone numbers: https://github.com/EricSchles/investigator/blob/master/code/investigator/app/text_parser.py Since I already built this functionality and the objects are written, I figure I'll just steal the code and put it here

cmc333333 commented 7 years ago

Very neat, and certainly something that comes up in gigantic CSVs. For idea 2: what do you think about packaging up your code so we could use it as a library rather than copy-pasting? I can definitely help there, if you want.