evilhero / mylar

An automated Comic Book downloader (cbr/cbz) for use with SABnzbd, NZBGet and torrents
GNU General Public License v3.0
976 stars 172 forks source link

Mylar Memory and CPU usage #1446

Closed prashantak closed 3 years ago

prashantak commented 7 years ago

Mylar is continuously using up to 15% of CPU all the time

Memory usage increases over time... Have noticed up to 2Gigs of RAM usage over a two day period...
Restarting Mylar brings it down... but it keeps on increasing.

My build : 2b0b28bcec8410909cfd84a2df3616f096890af6

Windows 10 64bit, Python 2.7.9

evilhero commented 7 years ago

Can you confirm how many series you have on your watchlist, as well as how many issues you have on your Wanted list?

I have a feeling that the rss is firing off too quickly while a background full search is being performed.

That or the logs are the culprit ad there's too much information being written to them for the system to keep up with and rollover the logs properly. Do your log files show the current running instance or are the logs from like last week or something and they're not actually current logs?

prashantak commented 7 years ago

Series watching : 296 Issues wanted : 867

Logs shows the current instance... but it is definitely big... it has about a million lines right now...

evilhero commented 7 years ago

It could very well be due to the number of wanted issues. Every 20 mins all 867 issues get checked against the rss feeds, so if the 867 issues aren't finished before the next cycle it waits and then starts once the first is finished and so on. Then every 6 hrs (or when you perform a force search from the wanted tab) it does an rss + api search for each wanted issue which if you have a few providers would take at a minimum over 14 hours to complete with just one provider based on your Wanted list size.

I do plan on implementimg a tiered wanted list, where if an issue has been Wanted for more than 2 weeks it falls into a much less frequent search pattern which should solve most of these types of problems (the excessive log sizes being the other that needs to be get addressed).

rboylesDev commented 7 years ago

I can't speak for the high cpu, but I am in the process of setting up a new docker (linuxserver/docker-mylar) instance of Mylar and I'm seeing what I think is high memory with just an empty instance.

I just spun it up and I'm seeing it sitting at 287 MB of RAM... that seems like a lot, especially compared to my Sonarr instance that is only using 195MB and I now that runs on top of Mono which is not very efficient on linux.

evilhero commented 7 years ago

Well again it depends on what Mylar is currently doing - the more it does (which it does alot on startup by grabbing new pulls and updating existing series, etc) the more it will consume.

From my ubuntu machine:

123.0 MiB +  38.3 MiB = 161.3 MiB       python2.7 (3)
457.3 MiB + 882.0 KiB = 458.2 MiB       sabnzbdplus
  1.4 GiB +   3.0 MiB =   1.4 GiB       `mono-sgen

where I actually have 2 mylar's running forked - with over 600 series in my watchlist and that's with it doing stuff ...so I'm not sure there's much of a memory problem in that regards.

I could be wrong of course, but just speaking from my personal usage as I can monitor/test that.

11001010111111101011101010111110 commented 6 years ago

I am having a similar issue, recent versions of Mylar under docker are using way more resources than any other container.

89c979d5-f2a3-4730-955f-d4d58a6a97ad

evilhero commented 6 years ago

Unfortunately I can't provide any immediate insight of this.

Again it depends alot on what you have on your own system happening with regards to Mylar. If you have 1000 issues marked as Wanted it loads all the required info for all 1000 issues into memory and then iterates over each (if you're doing a force search, or the 6hr search interval hits). Depending on your number of indexers being used, this could go on for quite a while just because it has to pause between searches so not to hammer the sites. It also does frequent backend updates to the series on your watchlist to ensure they're up-to-date, as well as pull-list refreshes (make sure you're using alt_pull=2 in the config.ini, and not any other value).

If you have folder monitor running every minute or two, and you have quite a few issues in your folder monitor that it has to iterate over in order to find something new (ie. if you have your post-processing action set to copy because you're seeding), it might cause spikes in usage because it's a pretty intensive thing as it checks the crc values and other items to make sure it's not something already post-processed.

I'm all for fixing memory problems and such, and I'm trying to address the constant usage due to large searches of the Wanted list, but it's abit more difficult than just changing how the iteration flows. Plus, time tends to play a factor in things too - as a hobby, I have to limit how much time I spend on Mylar due to personal life.

If you can provide any further insight as to what you're doing at the time of cpu spikes/usage it would help narrow things down, but aside from what I mentioned above, it's hard to pinpoint.

11001010111111101011101010111110 commented 6 years ago

With these: Series you're watching: 282 Issues you're watching: 2984 Issues you actually have: 2683 Issues wanted: 229

I adjusted a few parameters including: NZB Search interval to: 360 minutes (not sure what the default is) (1) RSS Check interval: 300 minutes (not sure what the default is) (2)

and disabled 32Pag.es (3) and Experimental Search (4).

I am guessing it's (2) that did it but here is the result:

grafana_-_docker_containers
evilhero commented 6 years ago

Yeah those stats you provided are well within normal range - the one that would definitely hit the CPU is the issue wanted one based on some other conversations with people.

As far as your parameter adjustments, that's interesting: 1) the search interval is the amount of time to wait between searching your entire Wanted list (ie. Force Check on the Wanted tab, does both RSS+API) - it defaults to 360 minutes, so if it was set to less than that it would have some serious overlapping timing issues (besides the fact that it shouldn't be set less than 360 and Mylar should have caught that). What was it set to previously? 2) the default for rss is 20 minutes - setting it to 300 would mean you'd have some serious gaps in the rss cache, but aside from that I wouldn't have thought that would affect anything to the point where it would hit the CPU hard. 3/4) Both the 32P & Experimental have no impact except when searching - aside from adding extra search loops when performing both options 1 & 2.

11001010111111101011101010111110 commented 6 years ago

So I set the RSS check to 20 minutes and saw a spike in CPU usage to what it was prior.

I'll watch to see if memory usage goes up dramatically as it had before.

When you say that less frequent than 20 would result in gaps, do you mean that issues wouldn't get picked up as they would be coming in and out of the RSS feed between refreshes? This would cause Mylar to not know that a new issue is now available so as to download it? This information then isn't provided by Comicvine? Where is the RSS feed from?

banners_and_alerts_and_grafana_-_docker_containers

evilhero commented 6 years ago

The RSS Feeds are from your indexers. What the setting does is that it polls each of your indexers for the latest comics RSS feed. It then writes the data to the db (for cache results), and then performs a basic search against your Wanted list against only this RSS data. So the more indexers you have within Mylar, the more data it retains to search against (probably in memory).

Normally setting it to longer than 20 minutes would result in some items being missed from the RSS feed, as most feeds only hold about 100 items. Not normally a problem, but on busy days (like Wednesdays) you might miss a few because of how many items get released. ComicVine has nothing to do with the RSS feeds - it only contains the series/issue data. The pull-list controls when Mylar knows something is new & can be searched/downloaded, and even that isn't really using ComicVine for population data.

11001010111111101011101010111110 commented 6 years ago

So if I only have one search provider selected, say DogNZB, it will only get that one RSS every 20 minutes. What's weird is that I didn't even have DogNZB checked anymore, not sure why.

evilhero commented 6 years ago

Yes, if you enable the RSS option and you set it to 20 minutes. If, with dognzb, you happen to go over your api limit or it returns a 'down' type response (cause they're down for maintenance or whatever), Mylar will automatically disable the provider so you don't hammer them and they end up responding to that. At this point, it's a manual option to re-enable the provider, but it will be a timed re-enabling in the near future (ie. it will re-enable after 3hrs or something)

11001010111111101011101010111110 commented 6 years ago

Still having a lot of issues, to the point where it only downloads comics when I restart the app. My settings must be completely broken. What file can I look at for default settings, @evilhero ?

evilhero commented 6 years ago

They're usually loaded into memory by default if not indicated. If however it only searches when it's restarted it sounds like there's some error(s) occuring on the backend. Is there any traceback errors in the logs? Are you up to date on whichever branch you're running?

11001010111111101011101010111110 commented 6 years ago

I am on the latest branch, AFAIK:

Mylar Version: master
-- git build 81252f3ebb42b1821c9ea66b8d258410a024fa8d.
Python Version : 2.7.14

Anything special I could look for, traceback wise?

I only see DEBUG, INFO & WARNING level items in the log. Nor ERROR, if that's expected.

rupaschomaker commented 6 years ago

Regarding memory and CPU. I was using nzbhydra but not the magic name that mylar uses to recognize nzbhydra. So it was doing an empty query with no category for the RSS feed. This resulted in 2+M records in the RSS table. Let's just say that the way mylar processes that result set was not designed for efficiency when you have that many in there.

So, once I cleared out that table, vacuumed full the sqlite database and fixed the name to nzbhydra in mylar I've not had performance issues.

One tool worth looking into is pyflame (https://github.com/uber/pyflame). It can give you a really good idea at the python level where your CPU is being spent w/out the huge performance impact of the python built-in profiler. Won't help with memory, but might help you track down CPU.

On Fri, Jul 6, 2018 at 6:49 PM 11001010111111101011101010111110 < notifications@github.com> wrote:

I am on the latest branch, AFAIK:

Mylar Version: master -- git build 81252f3ebb42b1821c9ea66b8d258410a024fa8d. Python Version : 2.7.14

Anything special I could look for, traceback wise?

I only see DEBUG, INFO & WARNING level items in the log. Nor ERROR, if that's expected.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/evilhero/mylar/issues/1446#issuecomment-403171948, or mute the thread https://github.com/notifications/unsubscribe-auth/ABu4IGbUWVZ9IAWSGPUMLC2KT66H7FuDks5uD_eAgaJpZM4KqsBB .

-- -Rupa

evilhero commented 6 years ago

@rupaschomaker I'm curious as to what you had for a newznab entry for your nzbhydra that caused it to do an open query. From what I remember using nzbhydra for the name (not host) in square brackets enabled a local bypass for queries, but that was all it did (it's been awhile, but that was part of it if nothing else).

The RSS (and almost everything else) probably should have some indexes built so that queries would be more optimal, as there isn't any at all - but I honestly haven't even looked at anything like that as of yet.

11001010111111101011101010111110 commented 6 years ago

Is there a table I can take a look at that could reveal issues?

Still only looks for new issues on restart

rupaschomaker commented 6 years ago

I think I just had hydra has the name. which from the code should just fail back to api using the same config as nzbhydra. [looking at code] -- yup. Would have to experiment some to see if I could duplicate, but the nzbhydra logs were showing an open api search with no category set.

08-Jul-2018 08:34:01 - DEBUG :: mylar.nzbs.470 : Thread-13 : [RSS] (nzbhydra) now being updated... 08-Jul-2018 08:34:01 - INFO :: mylar.nzbs.502 : Thread-13 : [RSS] (nzbhydra) 0 entries indexed.

When I force a rss search I'm just getting 0 entries indexed. Dunno whats up with that.

I don't think the issue was with indexes. It was probing each wanted item against all the entries in the rss list (or it seemed that way). Looking at the code again, this obviously isn't whats happening. the query against rssdb does limit by title so... obviously not as bad as it looked from the logs.

On Sat, Jul 7, 2018 at 6:34 PM evilhero notifications@github.com wrote:

@rupaschomaker https://github.com/rupaschomaker I'm curious as to what you had for a newznab entry for your nzbhydra that caused it to do an open query. From what I remember using nzbhydra for the name (not host) in square brackets enabled a local bypass for queries, but that was all it did (it's been awhile, but that was part of it if nothing else).

The RSS (and almost everything else) probably should have some indexes built so that queries would be more optimal, as there isn't any at all - but I honestly haven't even looked at anything like that as of yet.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/evilhero/mylar/issues/1446#issuecomment-403249834, or mute the thread https://github.com/notifications/unsubscribe-auth/ABu4ILYLVQHWf1SvS751GY9gH20YzSfgks5uEUV8gaJpZM4KqsBB .

-- -Rupa

mediabot commented 6 years ago

I was wondering is there anything I can add to this debug wise? I just had to kill Mylar because it was taking up 15% of the server's memory. There are 17 comics being monitored with only 45 issues missings. Seems really odd that it would require one gigabyte of memory for such few items.

barbequesauce commented 6 years ago

It's being looked at, although it's not something that lends itself to rapid diagnosis.

How long had your copy been running? What branch/commit was it? If you're using folder monitor, how many objects are in that directory and its subdirectories? What's listed in your "Bragging Rights" table in the top right corner of the front tab of the config page?

mediabot commented 6 years ago

Uptime was a little over 1 day, mylar start automatically with systemd

Version: 81252f3ebb42b1821c9ea66b8d258410a024fa8d (master) OS Ubuntu 18.04

Bragging Rights ( Not really much here and would be worried to monitor more @ this rate )

- # of Series you're watching: 17

- # of Issues you're watching: 254

- # of Issues you actually have: 180

evilhero commented 6 years ago

Try updating to the latest commit. Both master & development have an entirely new way of handling searching and post-processesing requests which alleviates alot of the memory requirements that was used previously.

Not saying it will help, but it definitely wouldn't hurt to be on the latest build so we're all working from a similar methodology.

Zachar2 commented 5 years ago

I have the same CPU+RAM issue as OP despite a fresh install and having 0 items being downloaded or watched. I only added 10 Torznabs from Jackett to config. Similarly, I'm on Windows 10 64bit, Python 2.7.15.

Zachar2 commented 5 years ago

This was fixed lately. Thank you for that.