dedupeio / dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
https://docs.dedupe.io
MIT License
4.05k stars 547 forks source link

Error when reproducing Gazetteer Example #1090

Open hlra opened 1 year ago

hlra commented 1 year ago

I am trying to reproduce the Gazetteer Example. This line: results = gazetteer.search(messy, n_matches=1)

Throws the following error:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\dst\lib\code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\envs\dst\lib\site-packages\dedupe\api.py", line 930, in search
    return list(results)
  File "C:\ProgramData\Anaconda3\envs\dst\lib\site-packages\dedupe\api.py", line 936, in _format_search_results
    for result in results:
  File "C:\ProgramData\Anaconda3\envs\dst\lib\site-packages\dedupe\api.py", line 867, in many_to_n
    yield from clustering.gazetteMatching(score_blocks, threshold, n_matches)
  File "C:\ProgramData\Anaconda3\envs\dst\lib\site-packages\dedupe\clustering.py", line 313, in gazetteMatching
    for block in scored_blocks:
  File "C:\ProgramData\Anaconda3\envs\dst\lib\site-packages\dedupe\core.py", line 233, in scoreGazette
    first, record_pairs = peek(record_pairs)
  File "C:\ProgramData\Anaconda3\envs\dst\lib\site-packages\dedupe\core.py", line 279, in peek
    first = next(seq)
  File "C:\ProgramData\Anaconda3\envs\dst\lib\site-packages\dedupe\api.py", line 793, in blocks
    pairs = con.execute(
sqlite3.OperationalError: no such table: indexed_records

I tried to look into the code to figure out what's going on but I'm afraid I do not understand. I am using dedupe 2.0.17.

fgregg commented 1 year ago

did you call the index method before calling search?