asah / footprint2009dev

original dev repo for AllForGood.org
http://AllForGood.org/
0 stars 1 forks source link

check for duplicate Habitat for Humanity listings (direct and from servenet) #133

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
this is our first time getting listings from multiple providers

Original issue reported on code.google.com by adam.sah on 10 May 2009 at 1:40

GoogleCodeExporter commented 9 years ago

Original comment by adam.sah on 13 May 2009 at 6:59

GoogleCodeExporter commented 9 years ago

Original comment by adam.sah on 26 May 2009 at 5:28

GoogleCodeExporter commented 9 years ago

Original comment by adam.sah on 26 May 2009 at 7:49

GoogleCodeExporter commented 9 years ago
erring on the side of adding searchqual label (for searchqual prioritization)

Original comment by adam.sah on 3 Jun 2009 at 9:54

GoogleCodeExporter commented 9 years ago
Got it.  Found some good links on 
http://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy-
string-comparison

Looking into the following algos:

http://en.wikipedia.org/wiki/Boyer–Moore_string_search_algorithm
http://en.wikipedia.org/wiki/Fuzzy_string_searching
http://en.wikipedia.org/wiki/Bitap_algorithm
http://en.wikipedia.org/wiki/Damerau–Levenshtein_distance
http://pypi.python.org/pypi/python-Levenshtein/0.10.1

And here are some existing Python libraries using some of the above algos:

http://en.literateprograms.org/Boyer-Moore_string_search_algorithm_(Python)
http://code.google.com/p/google-diff-match-patch/
http://code.google.com/p/pylevenshtein/
http://docs.python.org/library/difflib.html

Original comment by christop...@gmail.com on 9 Jun 2009 at 5:56

GoogleCodeExporter commented 9 years ago
as a matter of good discipline, since v1.5 is technically live,
re-targeting for v1.6

(since there's few issues in 1.6, they won't get "lost in the noise")

Original comment by adam.sah on 19 Jun 2009 at 3:41

GoogleCodeExporter commented 9 years ago

Original comment by adam.sah on 25 Jun 2009 at 6:01

GoogleCodeExporter commented 9 years ago
another example, not sure if we could reasonably catch this
   (without getting false positives)

http://www.adamsah.net:8081/search#q=health&num=10&start=1&vol_loc=801%20Brewste
r%20Avenue%2C%20redwood%20city%2C%20ca&timeperiod=everything&cache=1

A   
Friendship Center Volunteer
Redwood City, CA - Present - June 22, 2010
The Friendship Center programs are community-based activities that provide 
recreation
and socialization opportunities for adults with mental illness. The purpose of 
the
program is to offer people the opportunity to participate in activities and 
events
that give them a sense of ...
www.idealist.org -
Like - Share

J   
Volunteer Friendship Center
Redwood City, CA 94063 - Present - July 15
The Friendship Center programs are community-based activities that provide 
recreation
and socialization opportunities for adults with mental illness. The purpose of 
the
program is to offer people the opportunity to participate in activities and 
events
that give them a sense of ...
www.volunteersolutions.org -
Like - Share

Original comment by adam.sah on 8 Jul 2009 at 3:05