Impliment sliding scale and double check for anthologies

The algorithm does poorly identifying anthologies; books contained within anthologies show up as 20%-40% matches. Books with no relation sometimes show up as high as 15-20% in our test sample. We need a two-pronged approach:

Adjust "match" threshold dynamically based on the size difference between the two books.
If we get a match that is a low percentage, we might want to double-check it with our D-L algorithm, using a subset of words from the original text (not the unique words!)

The cons of this approach are also twofold:

getting a new set of text from a file is potentially very slow
the D-L compare is very slow.

However, if these lookups are rare (and anthologies don't often make up a large portion of a collection, usually) the overall impact should be minimal. Testing will confirm.

TheFeshy / BookSift

Impliment sliding scale and double check for anthologies #12