TheFeshy / BookSift

A plugin for Calibre to do book identification and duplicate detection using the book's text instead of metadata
2 stars 0 forks source link

Implement and speed-test "early exit" comparison code #9

Closed TheFeshy closed 13 years ago

TheFeshy commented 13 years ago

Instead of finding the intersection of two unique word sets, we can simply keep track of "missed" words, and exit when we reach a certain threshold (determined by the comparison threshold.) This may or may not be a performance improvement; we have to test it to verify.