CIIR / Proteus

Million Book Project
8 stars 5 forks source link

I want to be able to mark duplicates #83

Open jjfiv opened 9 years ago

jjfiv commented 9 years ago

As an example, the first six results of "slave trade act" 1807 are duplicates of the exact same page. It would be nice to be able to mark this somehow and prevent them from showing up again separately, at least to me.

mzarozinski commented 9 years ago

That's an interesting issue. There are "true duplicates" where the same book was scanned twice, and duplicates where the book is a different edition. The later is very interesting to the target audience as there may be different footnotes or commentary. The former adds little value, ideally we'd only return the one with the best OCR.

jjfiv commented 9 years ago

Even if the different versions have different footnotes, it would be nice to retrieve them together if they all match, so that you can browse other versions for the specific purpose of looking for footnotes or comparing differences. I could imagine a left,right motion (if we were that good) for switching versions of books.