DobyTang / LazyLibrarian

This project isn't finished yet. Goal is to create a SickBeard, CouchPotato, Headphones-like application for ebooks. Headphones is used as a base, so there are still a lot of references to it.
732 stars 71 forks source link

Enhancement: #1645

Closed Code-Slave closed 5 years ago

Code-Slave commented 5 years ago

have a stats page off main menu or in config with counts of various things Book bookswanted books ignored books with no desc

authors authors ignored authors with no metadata authors with no books

same for audio

mags (maybe not much else as there isnt a lot of info for a mag)

philborman commented 5 years ago

Could be done as a popup in config like the job_status button?

Code-Slave commented 5 years ago

yes, unless you want to link in some way, say to a list of all books with no description etc

philborman commented 5 years ago

Basic stats popup added, most of the lists are available from the api already, I think,

yabdali commented 5 years ago

Basic stats popup added, most of the lists are available from the api already, I think, Thanks, need to check those numbers to find out the optimal configs.

capture

philborman commented 5 years ago

Cache stats and sleep times are only relevant on a libraryscan, show how efficient the caching is. The interesting bits are... 99 authors overdue = their cached data is older than config setting. If you run lazylibrarian 24/7 this should reduce to zero over time No ISBN and No Lang maybe don't matter, not all books have that info, but often if that data is missing we are missing other data too, probably incomplete info from goodreads. No Description is more of a puzzle. Seems goodreads don't always include that info in the authors list of books any more. I think they used to almost always have the info, now about 1 in 3 of my library is missing description. Need to look into what we can do about it. 7 blank series = 7 series where we have a series title but no list of books in the series 5 blank authors = same, we have the author details but no list of books for them Magazines you seem to have 1 issue, but no magazine titles and no empty titles, so it looks like there is a magazine issue left in the database where the parent was deleted?

philborman commented 5 years ago

Seems the missing book data is a terms-of-service issue, they can serve the data over html but not api, depends on where they sourced the info. It's described here... https://www.goodreads.com/topic/show/400976-book-blurb

yabdali commented 5 years ago

Cache stats and sleep times are only relevant on a libraryscan, show how efficient the caching is. The interesting bits are... 99 authors overdue = their cached data is older than config setting. If you run lazylibrarian 24/7 this should reduce to zero over time No ISBN and No Lang maybe don't matter, not all books have that info, but often if that data is missing we are missing other data too, probably incomplete info from goodreads. No Description is more of a puzzle. Seems goodreads don't always include that info in the authors list of books any more. I think they used to almost always have the info, now about 1 in 3 of my library is missing description. Need to look into what we can do about it. 7 blank series = 7 series where we have a series title but no list of books in the series 5 blank authors = same, we have the author details but no list of books for them Magazines you seem to have 1 issue, but no magazine titles and no empty titles, so it looks like there is a magazine issue left in the database where the parent was deleted?

I had ticked "Increase delay for previously failed searches" in the config page few weeks ago, most probably thats why it shows overdue! I went back to Manage eBooks and changed the status of "Wanted" to "No Delay". Do you think this will reset the delay and looks up for those overdue in the next search job?

philborman commented 5 years ago

The "increase delay" setting is for books we searched for but couldn't find, ie books marked "Wanted" that the search tasks didn't find for download. If they couldn't find them today they are unlikely to find them tomorrow, so we increase the delay and only search for that book every Nth time. Details in Manage page, eg delay 3/5 means we only search every 5th time and we have skipped the last 3. If the book still isn't found when we next search, the 5 increases to 6 so the delay gets longer each time.

Resetting to "No Delay" will mean the book is searched for on the next run but is only temporary, so if the book isn't found you will try again with a small delay. If you turn "increase delay" off we don't check the counters and search every time.

The "Authors overdue" is where your cache expiry is say 30 days, and you have 99 authors whose book lists are older than this. We refresh one author every so often, depending on how many authors you have and the cache expiry. Details of who is next and when are in the config "Job Status" button

yabdali commented 5 years ago

Seems the missing book data is a terms-of-service issue, they can serve the data over html but not api, depends on where they sourced the info. It's described here... https://www.goodreads.com/topic/show/400976-book-blurb

I tried using the API through the browser trying some of the books I have in LL without description. but had an empty description tag. I guess as you said its to do with TOS from 3rd parties. https://www.goodreads.com/book/show/20446068-quicklet-on-geoffrey-a-moore-s-crossing-the-chasm?key=XXXXXXXX&format=xml

https://www.goodreads.com/book/show/24964462.how-to-govern-anything?key=XXXXXXXXX&format=xml

Returned::::

true

<description/>

Is it possible to include the ability to modify the description similar to the cover when I click on the "manual" http://abcd:5299/editBook?bookid=12345?

philborman commented 5 years ago

Possible, I tried that a while ago but html formatting in the description was causing issues. We now have a popup that can display though, so might take another look at it.

Another option would be an api call to update description

yabdali commented 5 years ago

Possible, I tried that a while ago but html formatting in the description was causing issues. We now have a popup that can display though, so might take another look at it.

Another option would be an api call to update description

Thanks for the continous support, really appreciate it. The popup only is for display, right? What about the API call to update the description, what command shall I use? I tired help but I cant narrow down to the exact command.

philborman commented 5 years ago

Yes the popup is only for display, but I can copy the way they edit the raw text, I think. There is no api call for this yet, but it would be easier to write an api call than an editor/popup, what do you think, would an api call be enough?

yabdali commented 5 years ago

The API call will ideal only if description is available by Goodreads, I presume. Can the API use alternative sources such as LibraryThing using the provided API key to lookup description?

I see having a pop up with pagination for selected items in the library with missing details more ideal for fixing issues for 10s but not 100s of books. Perhaps something similar to how you currently provide an option to change cover using GR, Google isbn etc..

philborman commented 5 years ago

Needs thinking about. Have not found a good way of getting the missing info yet. We could page-scrape the html from goodreads but it's against their tos. Could maybe use librarything and google instead

yabdali commented 5 years ago

Page scrapping is going to be painful to maintain considering that GUI changes are unavoidable, I did some webpage scrapping about 20 years ago and I understand the amount of efforts required to keep it functional. You can add the description fetched initially to the editbook popup http://abcd:5299/editBook?bookid=xxxxx and give the user an option to select the available description by clicking either Goodreads, Google ISBN and so on. Just similar to the way they manually update the covers. Ultimately, the end user can copy the description from somewhere else and paste it into the description section similar to how Calibre-Web enable users to modify the book info as part of the upload feature (https://github.com/janeczku/calibre-web/blob/master/cps/uploader.py)

I checked Google API call, there's one book that nither Google nor GR return their description using the URL below. Most of the books I checked return some JSON response with including the description. https://www.googleapis.com/books/v1/volumes?q=isbn:1614641420

philborman commented 5 years ago

That's very useful, thanks. I think I might just use the googleapis link for now, only if there is no description from goodreads. Maybe add an editor at a later date, but any description is better than none!

Code-Slave commented 5 years ago

I agree about using diff api's. Manual is a perfect place to edit. This is where i was going with stats. Show me all books with no desc etc

philborman commented 5 years ago

Couple of problems. Googlebooks api keeps giving me 403: forbidden errors. Seem to get useful results for a while then they say "dailyLimitExceeded", limit is 1000 hits per day, so we will have to drip-feed the updates. Maybe only do it on a scheduled author refresh, and keep track of the failed state so we stop until the counter resets.

Other problem is we would need an editor widget for the manual edit page if we want to edit the book description. We only have a line editor at the moment and it doesn't like html which many page descriptions use.

philborman commented 5 years ago

We should be tracking the 403 errors now and not trying googlebooks until the daily quota is reset. It's a bit complicated as they reset them all at midnight pacific time so we have to calculate the time difference and I'm not sure if that's pacific time with/without dst ?

I have also added a basic editor widget for the book descriptions. You can cut/paste from other pages and include html

yabdali commented 5 years ago

Thanks for the update, I can see the description editor and looks great.

In regard to Google API, the example I shared doesn't use a key so it might be using IP address filter to limit the daily requests.

As for the time/dst, you can use https://www.programmableweb.com/api/worldtime "API can also return information on whether a time zone is currently in Daylight Savings Time (DST), when DST starts and ends, and the UTC offset". You can store the requests count per date and check if date_last = date_current AND API_Requests <1000

Hope this may help...

philborman commented 5 years ago

If you don't give an api key google uses a much lower limit (ip based, maybe 100 per day, not sure) The error message says "Daily Limit Exceeded. The quota will be reset at midnight Pacific Time (PT)" but I don't know if that's PDT or PST, assume PST?

The current code just looks for the 403 error and blocks until next PST midnight. No point in counting requests as we might not be the only program calling google

philborman commented 5 years ago

Seems google also blocks you if you are behind a vpn and it can't determine your geolocation, but they let you specify country=US or whatever, so added a config option for that.