DobyTang / LazyLibrarian

This project isn't finished yet. Goal is to create a SickBeard, CouchPotato, Headphones-like application for ebooks. Headphones is used as a base, so there are still a lot of references to it.
728 stars 72 forks source link

Wishlists adding other copies of the same book to LazyLibrarian, but unable to add to Calibre #1613

Closed jcal22 closed 5 years ago

jcal22 commented 5 years ago

LL is adding a book to the books db based on isbn in wishlist. If the book exists with a different isbn in Calibre, it gets added to LL as wanted. When LL downloads the book it can't be added to Calibre because it is seen as a duplicate. A cycle ensues where it tries to download the book and/or stays on wanted list because deleting it from LL just gets it added again next cycle.

Can LL be made to check author/title to lower duplication?

philborman commented 5 years ago

Depends on the wishlist. If the wishlist came from goodreads it should have a goodreads bookid. We search the database for bookid, if not found we try isbn, if still not found we try author name and book title, and finally we try a fuzzy match on author/title. Once it's in the database we just use the goodreads bookid to identify it which should stop duplicates in our own database.

Maybe we need to check for the "duplicate" error message from calibre and update our database to the book location using the calibre bookid. Will take a look.

philborman commented 5 years ago

Can you post a debug log showing the problem so I can see where to check for the "duplicate" message, got some free time coming up where I can look at this in more detail.

jcal22 commented 5 years ago

logs.txt

jcal22 commented 5 years ago

There's an error in there that is from my code earlier. I will submit a pull request.

philborman commented 5 years ago

Thanks for the log but unfortunately it's not helping much. Calibre tells us it's a duplicate, but doesn't give us the bookid so we know what it's matched.

There are a couple of possible scenarios for this issue...

  1. We have the book in our database but fail to match the details
  2. We don't have the book in our database, but calibre has it and we don't realise

For scenario (1) Maybe a log of the wishlist search will give more info, why we think it doesn't match anything in our database. If we realised we had the book we wouldn't download another copy. The search/match process is quite complex but the debug logging should show how/why it failed.

Wishlist searches for the book in our database using finditem() in csv.py finditem() tries to match the bookid if that fails it tries to match isbn if that fails it tries author and title using find_book_in_db() in librarysync.py find_book_in_db() tries an exact match first, and a fuzzy match if that fails Extra fuzzy match debug logging is available if you set loglevel to 128 (higher loglevels are a bitmask)

If at the end of all that finditem() says the book is not in the database we query goodreads using search_for() to get the bookid. First query uses isbn, if no match we try a fuzzy match on author and title. Once we have the goodreads details we add the book to the database and mark it as "Wanted".

We need to see why finditem() says the book is not already there, or if it is there what's it's status.

For scenario (2) If we already have the book in the calibre database but it's not in the lazylibrarian database, this could be because we didn't recognise the book on a libraryscan, or rejected it for some reason? Might be worth doing a libraryscan of the one author (go to the individual author page and click ebook scan from there) with debug level 4096 (extra libraryscan logging) and look for any messages about the book title being rejected, or maybe goodreads doesn't list the book under the details we are expecting?

jcal22 commented 5 years ago

In my case they are books in Calibre and LL but also on my GR wishlist. In certain instances LL found different book IDs on library scan them what I added to a wishlist.

Could you on duplicate info from calibredb do a search for title and author? That seems to be what is matching in the calibredb. If you find it merge the records.

If the titles are slightly different it's adding to calibredb and you end up with a duplicate in Goodreads which is better than continually going out and trying to download it again and again.

jcal22 commented 5 years ago

It might just be title because we are sending metadata after import.

philborman commented 5 years ago

Not sure I quite understand. There are 3 sets of info, calibre, lazylibrarian, wishlist. Which ones match (if any) Do the details match in calibre and lazylibrarian but the wishlist is different? If calibre and lazylibrarian don't match we need to look at libraryscan to see why. Getting info out of calibredb to search for mismatches is possible, but would be simpler if we can find the problem on libraryscan, I think.

jcal22 commented 5 years ago

Sorry, I was typing on my phone.

My Scenario: 1) Book exists in Calibre "War of the Worlds" with a bookid generated by Calibre (Existed Pre-LL). 2) On LibraryScan, Calibre "War of the Worlds" is added to books db in LL. At this point they are in sync. 3) "War of the Worlds" added to a goodreads wish list. I'm guessing at the next part. 4)LL gets book from Wishlist with Goodreads ID, which is different than ID in LL and Calibre. 5)LL downloads book based on Goodreads ID. 6)Calibredb blocks import because Goodreads title for wishlist book is the same as LL title for CalibreID book. 7) Book with goodreads ID stays Wanted in LL and continuously tries to download.

My manual fix has been to either take it off Wishlist and delete Goodreads ID book from LL or change Book ID on existing book to match Goodreads ID (which requires deleting the Goodreads ID book)

philborman commented 5 years ago

Ok thanks, that narrows it down a lot. A few things to check though if you don't mind...

"War of the Worlds" is in LL, what''s it's Status? If "Open" then we correctly picked it up from calibre. The bookid is irrelevant at this point, calibre and lazylibrarian use different bookids (we use goodreads id, they use an index position in their database)

As "War of the Worlds" is in the lazylibrarian database, why doesn't the wishlist recognise it. It's not as simple as the bookid not matching. Goodreads can have several bookids for the same book (different editions) so our matching code is to take the wishlist entry and...

Look for wishlist bookid in database
  If not found, look for wishlist isbn in database
    If not found, look for wishlist Author/Title in database
      If not found, fuzzy match Author and Title with both parts >90% match
        If not found, create a new entry in database

so we try not to create extra entries if at all possible. The code that does this is finditem() in csv.py described in an earlier response, and a debug log of a wishlist search should show what's going on, particularly with fuzzy debugging on (loglevel 128) I suspect we fail the fuzzy match threshold but can't reproduce it here.

jcal22 commented 5 years ago

How do I turn on loglevel 128?

philborman commented 5 years ago

lazylibrarian config page, interface tab. Change value then save config.

philborman commented 5 years ago

Development has moved to GitLab All open issues have been moved over to LazyLibrarian Issues Please open any new issues or comments on the new repo as the old one is no longer maintained, thanks