TheFeshy / BookSift

A plugin for Calibre to do book identification and duplicate detection using the book's text instead of metadata
2 stars 0 forks source link

Impliment sliding scale and double check for anthologies #12

Closed TheFeshy closed 13 years ago

TheFeshy commented 13 years ago

The algorithm does poorly identifying anthologies; books contained within anthologies show up as 20%-40% matches. Books with no relation sometimes show up as high as 15-20% in our test sample. We need a two-pronged approach:

The cons of this approach are also twofold:

However, if these lookups are rare (and anthologies don't often make up a large portion of a collection, usually) the overall impact should be minimal. Testing will confirm.

TheFeshy commented 13 years ago

The current fast algorithm beats all the slow algorithms I have tried for accuracy. We may re-investigate this with other algorithms, such as shingling, but for now I'll close this.