DobyTang / LazyLibrarian

This project isn't finished yet. Goal is to create a SickBeard, CouchPotato, Headphones-like application for ebooks. Headphones is used as a base, so there are still a lot of references to it.
728 stars 72 forks source link

General issues and questions. #1529

Closed Code-Slave closed 5 years ago

Code-Slave commented 6 years ago

Im just typing up some things that are starting to creep up. Full disclosure. I hoard and read books and mags. Lots of them.

I have 700-800 books in the wanted status and 85 magazines, I am starting to run into API limits on various nzb sites (2k-2500 api hits max per day). I'm wondering if there is a way to set frequency for scans for mags and books separately. I have no specific suggestions just thing for the bigger hoarders out there it will be an issue.

my ebooks list is almost 2k items. when i go to that tab its really starting to crawl. Should i look into this like i did mags to speed up? Just really looking to start a discussion. The API hits are becoming a problem as thats affecting other apps.

philborman commented 6 years ago

A few things you can do, For api limits we automatically block a provider for a number of seconds if we get an error of any sort, including api limit exceeded. Default is 3600 seconds which is 1hr. You can extend this in config.ini blocklist_timer, so you could choose to block a provider for 24hrs until your limit is reset. Mag search and book search have different timers in lazylibrarian config processing tab. If your magazines are mostly monthly, maybe extend the timer so you only search once a week? For books it depends how prolific your authors are, maybe even longer? On the ebooks tab, the part that takes the time is the rendering of the table, so paginating speeds it up hugely. Set your dropdown on that page to 10 to see the difference. The filter is run before the 10 are selected, so you still filter on the whole ebook list. One thing I've been thinking about is to search for older books less often, so books that we've already searched for and not found get a lower frequency of search, unless you add new providers. Chances are if you couldn't find the book today, you won't find it on those providers tomorrow either (unless it's a new release). What do you think?

Code-Slave commented 6 years ago

I think the search older books less often is a good idea. One idea i had was to limit api hits in a day, so say only allow 500 queries a day for nzb (configurable) and setup a sort of queue, couple that with a timer? for older books and that would cut down a LOT

philborman commented 6 years ago

We can easily count daily api queries, but obviously only for ourselves, no idea what other programs might be accessing the same providers (sonarr, radarr etc) and eating into the same api limit. This is why I didn't go down that route, just watch for getting blocked and don't use that provider again for a while. It gets quite messy otherwise.

Another problem with limiting by counter is that you don't know when the counter will be reset by the nzb provider. Some do it at midnight their time so you can download 500 just before midnight and another 500 just after, taking different timezones into account. Some monitor when you download and use a running count, so if you use your 500 in 24 hours but 10 of the downloads were 23 and a bit hours ago you can do some more downloading fairly soon, until you top the counter up to 500 again. Some of the torrent providers are even more complicated as they take your seeding into account and give you more downloads depending on your seeding credits. I haven't come across a downloader yet that penalises you for trying, so trying again after an hour might cause another "limit exceeded" response, but doesn't count against you.

I'm thinking for searches maybe we keep track of how many times we have searched for a particular book, resetting all the counters to zero if we add a new provider. If a book is newly published, or fairly new, search every time. If it's an older book, only search every n times, so the ones we can't find get looked for less and less often.

Code-Slave commented 6 years ago

I get the api hits issues. But if i can limit LL to 500 for ex, then that a known thing. Its killing me right now as im hitting api every day. I will change the frequency for sure.

Now regarding old books. I like the idea a lot as it would help. What is considered old? I have some books that for some reason dont get a publish date from goodreads. Is it by date and or search count? Should we have an option of "Just scan old books"? Also in the ebooks list you could have a dropdown with All, Downloaded (successful grab), and Older(or something) and allow search to target whatever is in dropdown. This could help table draw speed too and speed up the queries for that page

philborman commented 6 years ago

No date from goodreads is fairly common via the api, there is often more info on the individual book page but it's expensive in api calls to do that for every book. At the moment we get all the authors books from the author page (one or two calls per author depending how many books they have written) and then fill in some dates from series pages (one call per series gets book dates for the whole series). I added some code a few days ago so that books that aren't part of a series and don't have a date in the author page now get filled in by separate api calls so we keep the numbers down as much as possible. LL will gradually fill in the missing dates as it refreshes each author.

The current code testing in my git doesn't check for "old" books, simply books we have searched for and not found yet. Might add an exception for new-ish books, see how it goes. You can force a scan for any book/books from the manage page or the ebooks page, so that takes care of targeted searches.

Table draw speed is quick if you use the "rows per page" dropdown as I said before, so that shouldn't really be a problem. I have 1900+ books but only show 15 per page and it takes about half a second redrawing.

Still not sure whether to add api limits, or how to limit properly.

philborman commented 6 years ago

New testing version in my git has api limits for each nzb/torznab provider in the config->categories page. Limits are reset on the next search after midnight local time

alagave commented 6 years ago

I see the logic for reducing API calls but every user is different. I have thousands of cookbooks in my collection and still manage to find older books as some kind soul digitizes one of those treasures. I see the same behavior in other non-fiction areas such as travel writing for example. Perhaps a genre-based metric could work.

I will open another thread about series but, pursuant to this thread, what if I have just a few missing books in a series? I would want to search for those at least once a day. Of course, torrents will require a new strategy to catch seeders that are only active on weekends or a few hours a day in their local timezone.

Also, please consider users that have signed up for unlimited API access. Perhaps a toggle control per NZB search provider?

philborman commented 6 years ago

Unlimited api access is easy. As with most lazylibrarian settings zero is off, so an api limit of zero is unlimited. Genres are tricky as they are very subjective, can be multiple genres per book, goodreads doesn't even include them in their api, and we don't store them anyway :-) There is a column in the books table for genre, but it's never been used.

I think to target missing books in a series, or any other missing books for that matter, if the downloaders don't have the book today they are unlikely to have it tomorrow either (might not apply to a newly published book) so a sliding scale should work quite well, I think. As for seeders only active infrequently, I don't know how you find them. LazyLibrarian will reject torrents with too few seeders (configurable, zero to disable) and you can set the "delete failed download" task to more than 24 hours? Haven't tried, but an error before this will still cause an abort so would probaby work. We will just think the download is very slow.

Code-Slave commented 6 years ago

will play with this this weekend. Work is stupid right now.

I like the api per indexer, of course using jacket nullifies that so would have to be sure not too. interested to see how that goes.

The more I am thinking on the filtering based on number of unsuccesful tries (count) the more I like it. A threshold maybe configurable is a publish date? so if it hits threshold and publish date is before 2012 then throttle to once a month? Just thinking out loud here. At minimum just throttling by count would help a ton.

alagave commented 6 years ago

I noticed that and disabled API limits on some providers but forgot I had done so.

As for catching sometimes seeders, I tried to search on a floating 2 hour window. 12pm on day 1, 2pm on day 2, etc. Weekends are another time when those guys pop up.

Code-Slave commented 6 years ago

Adding on this a bit. Db stuff. Im still having a good bit of slowdown (running on dual 16 core xeons with 4cores to the docker) so not a machine perf issue. first entry into books and mags is really slow. like 25 seconds slow

looking at the languages table there are almost 2200 rows. is that excessive? basically im looking for issues that may be causing joins to slow down. My db is only 9mb so not crazy big. Just looking at possible issues. any tips?

philborman commented 6 years ago

Something odd. I'm only on a lowly raspberry pi and get pages in under a second. Languages table isn't relevant here, it's used to look up language from isbn when you do a libraryscan or import a new book, not regular page draws.

Can you try setting debug level to 8480 it's a bitmask, that will turn on logging for admin (8192), serverside processing (256), database comms (32) You should then get log lines when you show the book page like this...

2018-08-07 19:49:04 | DEBUG | WEBSERVER | webServe.py | getBooks | 1798 | getBooks filtered 1877 from 1877:10
2018-08-07 19:49:04 | DEBUG | WEBSERVER | webServe.py | getBooks | 1797 | getBooks Books returning 0 to 10, flagged 1,0
2018-08-07 19:49:04 | DEBUG | WEBSERVER | webServe.py | getBooks | 1709 | Sortcolumn 1
2018-08-07 19:49:03 | DEBUG | WEBSERVER | webServe.py | serve_template | 145 | User admin: 65535 books.html

Hopefully just a short log extract from yours like this will show where the time is being taken up. My log extract above shows book page requested at 19:43:03, sorted on column 1 (author name) a second later, then no filter applied and showed the first 10 results from 1877 before another second elapsed. Yours might show a delay in selecting or sorting or filtering. If not we might need to dig a little deeper.

Code-Slave commented 6 years ago

I killed and reloaded the docker image, and things seems to be going well now. Im wondering if there was a corrupted or screwed up container. Thanks for the bitmask. will be handy

Code-Slave commented 6 years ago

I have tracked the db stuff down a bit. Im getting a lot of database locks. I think maybe multiple processes running at the same time is causing it. Still researching a bit

philborman commented 6 years ago

I don't see any database locks here, even with multiple processes. Could be hard to track down. Is it something docker related? Do you have to use docker, or can you try running lazylibrarian outside docker to eliminate that as the cause? There were some reports of odd behaviour with docker a while ago that turned out to be docker running out of memory or stack, not sure which, original thread is here... https://github.com/DobyTang/LazyLibrarian/issues/1174 Maybe this is something similar?

philborman commented 5 years ago

Any update on the database locks issue? I've just been revisiting that section of code, moved one of the locks to make "upsert" atomic where it was possible for another thread to jump in between the "update" and "insert" calls. Looks like it would be fairly easy to log what's holding the lock if it's a repeatable issue?

Code-Slave commented 5 years ago

I was looking at it again a few days ago, I cant repeat the problem with any consistency. The majority of the time its when its looking for books/mags and im going into that specific section. so searching for mags and im in mags. I will say that lately its not been so much an issue