As it stands, pgsql_big_dedupe_example fails when run with dedupe v0.8.0.1.7 (the latest currently available on pypi). Running python pgsql_big_dedupe_example.py generates the following output:
INFO:root:Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
INFO:root:Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
reading from pgsql_big_dedupe_example_settings
blocking...
creating blocking_map database
creating inverted index
Traceback (most recent call last):
File "pgsql_big_dedupe_example.py", line 177, in <module>
deduper.blocker.index(field_data, field)
File "/home/ec2-user/dedupe-examples/pgsql_big_dedupe_example/local/lib/python2.7/site-packages/dedupe/blocking.py", line 74, in index
index.index(preprocess(doc))
File "/home/ec2-user/dedupe-examples/pgsql_big_dedupe_example/local/lib/python2.7/site-packages/dedupe/predicates.py", line 161, in preprocess
return tuple(ngrams(doc.replace(' ', ''), 2))
AttributeError: 'tuple' object has no attribute 'replace'
As it stands, pgsql_big_dedupe_example fails when run with dedupe v0.8.0.1.7 (the latest currently available on pypi). Running
python pgsql_big_dedupe_example.py
generates the following output:This PR fixes this issue by bringing the call to
deduper.blocker.index
up to date with the current documentation for dedupe v0.8: