RandomNinjaAtk / docker-lidarr-extended

lidarr-extended :: Lidarr application packaged with multiple scripts to provide additional functionality
GNU General Public License v3.0
275 stars 24 forks source link

[Bug]: "Rescan folders" takes many hours when using script #132

Closed steve1977 closed 1 year ago

steve1977 commented 1 year ago

There is a closed report with similar / same issue (https://github.com/RandomNinjaAtk/docker-lidarr-extended/issues/70). Re-opening here as I am not sure whether closed reports are being read. @Bait-Fish

The issue is that somehow a "full rescan" is triggered, which takes many hours due to the size of the library.

The issue is replicable and seems to be related to the script. Let me share when it happens and also when it does not happen:

1) Start lidarr-extended docker

2) Restart docker

3) Restart docker again

I also noticed that "refresh artists" also sometimes kicks in when using the script, which may or may not be related.

steve1977 commented 1 year ago

One thought: It may be related to "refresh artists". If I run "rescan folder" before I run "rescan folder", it completes within minutes. If I run "refresh artist" and then "rescan folder", then "rescan folders" takes ages.

When I turn off the script, the issue goes away. It seems that the script may mark all artists as "changed" or something like this?

Bait-Fish commented 1 year ago

This has continued, and worsened for me. The last rescan took >24 hours, "a day ago" so could [not] keep good track. And the library hasn't changed much. It's been getting longer and longer. Just as the Audio script indexing has.

I'll look into you steps and see if I have the same behavior. My experience is the rescan happens every 24 hours. I've gotten used to watching for it, then restating the container. Generally that would end it for the remaining day. Then, wait for the Audio script to reindex for 2.5 hours before it starts pulling down again.

Edited for [missing] word

RandomNinjaAtk commented 1 year ago

@Bait-Fish I'm going to work on changing the process for what you call re-indexing to speed it up.

Honestly for the scan issue, I have no idea what would be causing it. My only thought would be to disable all scripts in the Lidarr's Settings -> Connection page and see if that improves it. Because with those scripts disabled, the files are only ever touched by Lidarr at that point...

RandomNinjaAtk commented 1 year ago

@Bait-Fish - Give the latest update (1.0.289) a try, this should fix the indexing issue. Hasn't been tested much at all though....

Bait-Fish commented 1 year ago

Updated and running 289 now. Interesting. Looks like it may grab missing albums as it processes? I like. I'll give it time to get some data back to you.

What is the preferred paste site, pastebin? Been a while . . .

And, my media is not pristine. it has grabs from other sources. Somehow I accidentally had some media get dumped into the Artist folder and is not subfoldered. But this is not a new problem, old files. I'm manually deleting as I have time. But this has not sped anything up.

Maybe I can peer into the data and see what these 56,000 items are. I just don't know how to find that. Possibly then I can target a problem, like similar repeats or some errata. It seems whenever that number would change, it has happened a few times quickly, unexpectedly bulking by a few thousand. Maybe then I could altogether remove a problem artist?

And, I did not get to try any of the suggestions by @steve1977, opting to go with your update first for feedback.

Thanks as always.

RandomNinjaAtk commented 1 year ago

Pastebin should be fine.

Bait-Fish commented 1 year ago

Audio 289 log so far. Seems to work fine. Much more satisfying than waiting hours for any progress. https://pastebin.com/PENPRbGM

Audio 282 recent scan time as an example of what was. https://pastebin.com/4PJwmGg4

Another random observation, unrelated. In the last week or two, the importing scripts at start seems to take longer. LE webui seems to take longer as a consequence. Not sure why it would take 3 minutes, as of this morning.

RandomNinjaAtk commented 1 year ago

What do you mean by importing scripts?

Bait-Fish commented 1 year ago

Before Audio and Video scripts are give the 2 minute wait, this shows,

11/20/2022 8:43:30 AM Removing previous scripts... 11/20/2022 8:43:30 AM Importing extended scripts... 11/20/2022 8:46:14 AM Complete...

RandomNinjaAtk commented 1 year ago

Its the permissions fix, that is causing that: https://github.com/RandomNinjaAtk/docker-lidarr-extended/blob/6f716420a6092fd8fa07fdf9a88466ceb53c8be4/root/etc/cont-init.d/98-script-setup.bash#L31-L36

Bait-Fish commented 1 year ago

Not sure if 289 will address the daily rescan issue. I'll keep it going to see.

Posting right now to update on the speed of 289. It is much slower with this method. At the moment, my progress is,

11/20/2022 10:23:36 PM 2022-11-20 22:23:36 :: Audio :: 1.0.289 :: Downloading page 829... (20700 - 20725 of 56999 Results)

And that's from starting around 9 am today. Prior to 289 it would have been done in 6 hours or less. As much as I like seeing results come in sooner instead of waiting 2+ hours, the overall time delay/extension now does not seem worth it.

But that isn't what we're fixing. I'll keep watching for a change in daily rescan time and let this thing run.

RandomNinjaAtk commented 1 year ago

@Bait-Fish update to the latest, tried to speed it up a bit more... honestly, it can only go so fast...

Bait-Fish commented 1 year ago

I'm holding off stopping/updating/restarting for the time being to see if the OP issue is addressed. If I do not see the rescan event by hour 28 or 30, I'll update.

Thinking through my own problem of excessive scan times, I created a custom filter in Lidarr for missing tracks (monitored=true, track progress<100, status=ended). Then I sort from largest. And I am now unmonitoring those artists/groups. These are big, old, well-known names which I am certain are the cause of my ballooning list.

So, user error there. If a group is ended, or artist deceased, monitoring is not necessary at a certain point. Always learning . . .

edit for grammar

edit x2 for update on the above: Removing 99% of deceased and ended artists/groups from being monitored indeed brought the albums being processed down from 56k to 35k.

I'll keep watching the rescan problem.

One last question while I have you attention. My Audio and Video log files no longer rotate to Audio1, Video1, etc. This makes for long logs, and difficult to watch start sequence. I had updated the extra parameters to match changes to the template. On seeing that removed, I removed the command. But it seems to have persisted. Suggestion?

Bait-Fish commented 1 year ago

I may have spotted something odd. As the indexing count ratchets up, at some point the count reverts back to #3. See timestamp 2022-11-21 15:33:53.

I had to truncate the middle of the file where artists 1-42 were processed. Pastebin complained about size.

https://pastebin.com/9DJbWWew

RandomNinjaAtk commented 1 year ago

Odd, just from memory, I don’t think there is anything that would or could cause that based on the design. But I’ll take another look at it tomorrow.

Bait-Fish commented 1 year ago

The counter reversion also happened on 289. It had counted up to 1150 (timestamp 2022-11-21 03:09:59), processed 10 items, then went back to 3 (2022-11-21 03:20:17).

https://pastebin.com/8EJaFnQr

RandomNinjaAtk commented 1 year ago

That is expected/normal behavior

RandomNinjaAtk commented 1 year ago

@Bait-Fish - You need to update to latest, I will not troubleshoot old code. Also since its an unrelated issue of the report, you should open a separate issue.

Bait-Fish commented 1 year ago

I have not observed the long rescan issue since. All I have done is remove ended/deceased artists and updated as dev recommends. A caveat, LE has only run about 28 hours each time and may not be hitting the rescan time.

Edit 1: Of course, after I posted that it starts to rescan not long after. I have a rescan that's going for 3 hours plus now.

steve1977 commented 1 year ago

Same here, it still has endless rescan for me.

One more insight for troubleshooting. I disabled both audio & video script (running usenet instead). And the issue with the long rescan is still there.

RandomNinjaAtk commented 1 year ago

Again, you need to disable the post processing scripts. I have a feeling that is what is causing it. But honestly, if you want to use them, and if that is the case, there is nothing to be done about it….

steve1977 commented 1 year ago

There are only two scripts there?

MetadataPostProcess.bash PlexNotify.bash

If I disable them, no more tagging takes place?

RandomNinjaAtk commented 1 year ago

Lidarr mainly does the tagging, assuming you have it enabled

steve1977 commented 1 year ago

So, this script takes care of Replaygain and embeds lyrics?

Or you referring to the audio-pp.bash post-processing done by sabnzb-extended?

RandomNinjaAtk commented 1 year ago

This is unrelated to any other containers.

versatilist commented 1 year ago

Can confirm this issue. While updates are nice, this means that the script seems to run constantly (on startup, and nightly scheduled). Takes 22:09:55 to startup my lidarr now. This must run before the deemix script can run, so it really makes checking any change you make (version updates) or I make (config updates) a non-starter.

STATUS :: LIDARR BUSY

Perhaps you can fork lidarr and make it possible to disable these Scheduled Tasks. I don't see much reason to rescan and touch each file every day anyway.

( I did disable the Connection script on PostProcess but I dont see how thats related to this issue as the script only applies when a file is retagged).

RandomNinjaAtk commented 1 year ago

I will not be making any changes to lidarr itself. That is out of scope and beyond my skill set.

The reason for disabling the post processing scripts is because my theory is the scripts are adding files to the final directories (lyrics, cover art) or possibly changing the modify times, so lidarr sees the changes and detects them, which causes a extra long rescan, as a result. So not having any post processing scripts should mean that only lidarr itself touched the files and prevent it.

But again, this is just a theory and you’d have to run it for a while to see if that is really the case.

Bait-Fish commented 1 year ago

I'm trying a new LE container (Unraid) that has nothing changed from defaults, audio/video scripts disabled, and only post processing(just now) removed. I have it scanning the same library and will see in a few days how it behaves. I'll report back then.

versatilist commented 1 year ago

With AMD the configuration could be updated, you could make changes to the script and push (frequent) changes as deezer changed, and there was also a separate logfile (terminal window) to track progress. I could also kill the docker or manually stop just that portion. While lidarr was running, stable, and without a restart.

Most importantly the foreground Task in Lidarr did not interrupt the AMD script.

Now, while it appears more cohesive, it actually works less reliably. As each time Lidarr is restarted it needs an extensive cleanup process, and your add-ons are interrupted by these Tasks.

I think I will check out the steroids approach where deemix exists parallel to trackers / indexers and lets lidarr manage it as one of three options to grab a download.

Hopefully you can reconsider, and maybe even bring back AMD. If you are not into making changes to Lidarr I think this merged approach may be flawed.

steve1977 commented 1 year ago

Let's wait for the results of @Bait-Fish's results. I actually love how it has been integrated. Now running sabnzbd/sonarr/radarr all in the extended version and like the additional benefits it brings.

I would not even mind not using the audioscript, but the videoscript is quite gold... Unfortunately, even when disabling both audio & video scripts and just downloading via usenet leads to this "bug". It is odd though as it runs for quite some hours really well (including occasional short folder rescans). But after some time, it "hangs" as we noticed. I don't think we have yet identified the root cause of the issue.

Bait-Fish commented 1 year ago

The test container finished it's first scan in just over 2 days time. I'm only peeking in on its status every once in a while. It's already rescanning, 6 hour so far. I'll keep monitoring without any other Lidarr instances involved. Maybe it will settle out and shorten after a couple days.

I have a thought to next test this library against plain Lidarr as well.

EDIT 1: The scan finished in 6 hrs and 8 mins, mere minutes after I posted above. Tasks indicates it will run 11 hours from now, so about 7 pm pst. I do not recall this task detail remaining for so long. Rescan Folders was running many times in between the daily rescan.

EDIT 2: The following scan did not start in 11 hours as LE indicated. It started after 12 pm pst, 24 hours after the previous scan. So far, it is LONGER than yesterdays, 8 hours and counting. . .

versatilist commented 1 year ago

I also notice that there are a number of pending tasks in the Queue (below Scheduled on Tasks). Mostly Rescan Folders.

After update and restart, It is currently running a scheduled Rescan Folders from 13 days ago. I manually removed two dozen scheduled tasks, including the one added on startup. For anyone with this issue, I strongly suggest you manually press the 'x' next to each scheduled task to clear out.

Note: the video script DOES run every 15 minutes and seems to run parallel to RescanFolders (inline in logs), but audio:

STATUS :: LIDARR BUSY :: Pausing/waiting for all active Lidarr tasks to end...

RandomNinjaAtk commented 1 year ago

Video is not dependent on Lidarr for imports, that's why there is no pausing for that script. The audio script pauses to keep everything running smoothly in terms of imports and processing by Lidarr.

steve1977 commented 1 year ago

Unfortunately, the issue goes beyond pausing the script though. During the endless rescan, all importing is also stopped (including importing files downloaded via usenet).

Let's see what @Bait-Fish's trial with the stock lidarr result in. Maybe something was changed in the mainline that triggered this issue independent of the script. It is odd that this issue even exists when disabling the scripts.

Bait-Fish commented 1 year ago

So far extended is not at fault in my opinion. This should be tested against a plain Lidarr install as well, which I have not yet done. [hoping someone else helps with that test, I will be delayed]

This current test, a fresh container, and default LE that never had audio/video scripts enabled, and has post processing disabled, is also running very long folder rescans. It has not yet stopped, 19 hours and continuing on this last daily run.

To recap this test, The "first start" folder rescan took >48 hours. The following day, 6 hours and almost 15 minutes. And this current one, 19 hours and counting.

The reason I quoted "first start" above is, I first set this container up as a pure extended install a month prior. I let it run for two days and gave up. So that is one wrinkle in this test. After cancelling, I returned to using my regular LE container. So, for the last test, the first start >48 hour run was a restart/continuation from the true first start which I interrupted in frustration.

I think I'll run this another day, maybe two. I ask that someone else with this similar problem try their library against a plain Lidarr (not LE) install. I have some needed Unraid software/hardware updating and rebooting to do. I'll get that going once this test is done. In the interest of progress, more parallel testing would speed us up.

EDIT 1: I remember running across this post about long Lidarr scan times previously, Allow the rescan process to be stopped

1105 https://github.com/lidarr/Lidarr/issues/1105

At the time nothing clicked for me to associate with my problem. But on rereading, the dev mentioned unmapped files causing delay. I mentioned previously, my library is not pristine. Now checking, I do indeed have thousands of unmapped files. A week or two ago, I started but did not finish addressing these (delete). After the current test/trial is done, I'll focus on those to clean up.

Current scan now at 20 hours and counting.

EDIT 2 (11/30/29): This scan is still going, about 44 hours and counting.

EDIT 3 (11/30/29 2:36 PM PST): The folder rescan finished during a period I was not able to check in on it. And another quick one replaced the last scan info on the tasks status page. So, just guessing, it went about 46 hours judging by Unraid's disk stats. I'll see what the next daily scan does, if this will settle out. Not holding my breath.

EDIT 4 (12/1/22): I was looking back in the Lidarr logs and saw up to 30 file permission errors, typically on .nfo files. On fixing these, it seems to have triggererd a rescan. It only went for 1.5 hours, and says the next rescan will be tomorrow. I may be shutting the system down tonight for a hardware update.

There are other issues it logged as well, aside from what appears to be 25k unmapped files I assume. Logged as such, [Info] ImportDecisionMaker: Reading file 17983/24578. I have a mind to just delete and try again, unless someone has a better idea.

The other issue that seems common is like such, [v1.1.0.2649] System.IO.DirectoryNotFoundException: Could not find a part of the path '/data/media/music/Peter Gabriel (formerly of Genesis)'. In /music the sub folder is merely /Peter Gabriel without the extra (info). Looking at MusicBrainz, they have it with the (info), mostly, but not always in each case. It seems I need to fix these too.

versatilist commented 1 year ago

After restart, the deleted tasks returned to the Queue. It is again running the Rescan Folders from 14 days ago.

Stopped again at 11761/12774. Especially odd since there are 48565 albums in the db.

Perhaps there was db corruption with a version around this time.

What is the best procedure to migrate a vanilla lidarr db into extended?

RandomNinjaAtk commented 1 year ago

If you want to start clean, stopping the container, purging the /config directory, then restart the container and it'll be like you just installed it.

steve1977 commented 1 year ago

@Bait-Fish did a clean install and the issue still existed, so don't think this issue can be due to a dated database corruption?

I believe the key open question now is whether the issue only occurs in LE or also in plain vanilla Lidarr. Has someone tried this out on the same library?

versatilist commented 1 year ago

I did a fresh install, keeping only the Lidarr.db from vanilla.

This seems to have improved the issue, less frequent Rescan Folders. When run, only about 14mins for ~8TB. Still runs ~hourly instead of Nightly.

DB size (vanilla): 1GB (13GB covers) DB size (extended; corrupted/rescan bug): 1.2GB (13GB covers) DB size (extended; clear /config, keep db): 1.4GB (19GB covers)

Took about 10hrs and running for Refresh Artist because all the cover art. I noticed a ton of cleanup like:

It also seemed to find a few albums that vanilla Lidarr did not previously import properly.

EDIT: After full system reboot, it is again doing the slower check-each-file Rescan Folders at 40-50MB/s. And again has a hangup 2022-11-30 12:02:00.8|Info|IdentificationService|Identifying album 11773/12767. Must mean there were 12 found albums during the first launch, and one album is causing the script to hang.

EDIT 2: Removed all directories with odd characters, and reset the permissions. Seems like the script passed the 11773 mark. Now downloading deemix again.

However, with this new install downloads are not showing up in the history:

[Error] EventAggregator: EntityHistoryService failed while processing [TrackImportedEvent]

[v1.1.0.2649] System.NullReferenceException: Object reference not set to an instance of an object. at NzbDrone.Core.History.EntityHistoryService.Handle(TrackImportedEvent message) in D:\a\1\s\src\NzbDrone.Core\History\EntityHistoryService.cs:line 220 at NzbDrone.Core.Messaging.Events.EventAggregator.PublishEvent[TEvent](TEvent event)

versatilist commented 1 year ago

I found the reason for the DB size discrepancy; my default Standard profile does not include singles. Once removed and refreshed, the size is back as before.

I found that if I simply restart the docker after the Rescan Folders task has task has started, it will not resume nor start again until next scheduled. Then the RNA amd script will countdown for liftoff.

Bait-Fish commented 1 year ago

This may be a way to delay the daily rescan, to something like once every 7 days (10080 seconds). This link mentioned how to in Sonarr. https://www.reddit.com/r/sonarrv3/comments/fhtvsc/change_sonarr_refresh_times/ but the names are different in Lidarr.

In Lidarr's appdata, lidarr.db can be opened with a SQLite browser, such as "sqlitebrowser" by linuxserver's Repo on Unraid.

So in my LE's appdata, lidarr.db, I can select ScheduledTasks, then Browse Data. Here I see these TypeNames (manually typed with cold hands may have spelling errors)

NzbDrone.Core.Download.RefreshMonitoredDownloadsCommand (Interval 1) NzbDrone.Core.Messaging.Commands.MessagingCleanupCommand (Interval 5) NzbDrone.Core.Update.Commands.ApplicationUpdateCheckCommand (Interval 360) NzbDrone.Core.HealthCheck.CheckHealthCommand (Interval 360) NzbDrone.Core.Music.Commands.RefreshArtistCommand (Interval 1440) NzbDrone.Core.MediaFiles.Commands.RescanFoldersCommand (Interval 1440) NzbDrone.Core.Housekeeping.HousekeepingCommand (Interval 1440) NzbDrone.Core.Backup.BackupCommand (Interval 10080) NzbDrone.Core.ImportLists.ImportListSyncCommand (Interval 1440) NzbDrone.Core.Indexers.RssSyncCommand (Interval 15)

So, I'm guessing, change NzbDrone.Core.MediaFiles.Commands.RescanFoldersCommand interval 10080 seconds for once every 7 days. -yes, applied (interval now "7 days")

The link above also changed RefreshMonitoredDownloadsCommand to 10 (from 1 in my LE default). Maybe that too? -yes, applied (interval now "10 minutes")

Any suggestions. I'm trying these right now. I've been having trouble getting past an initial star scan after having my primary LE container off for a couple weeks. I stopped it after almost three days of continuous scanning. sigh NOTE: DB size is 3.6 GB.

On applying the changes, I now see System>Tasks>Scheduled>Rescan Folders next execution listed as "in 7 days." I think I'll let this settle, even if it takes three days to finish this current scan to see how it goes.

RandomNinjaAtk commented 1 year ago

I prefer not to touch/modify lidarr at all in anyway.

versatilist commented 1 year ago

I tried to touch it. The (db) change did not stick. Looks like both Lidarr and Sonarr (when the codebase was similar) were hard coded to use their defaults regardless of what the sqlite db says. See:

src/NzbDrone.Core/Jobs/TaskManager.cs

        var defaultTasks = new[]
            {
                new ScheduledTask { Interval = 1, TypeName = typeof(RefreshMonitoredDownloadsCommand).FullName },
                new ScheduledTask { Interval = 5, TypeName = typeof(MessagingCleanupCommand).FullName },
                new ScheduledTask { Interval = 6 * 60, TypeName = typeof(ApplicationUpdateCheckCommand).FullName },
                new ScheduledTask { Interval = 6 * 60, TypeName = typeof(CheckHealthCommand).FullName },
                new ScheduledTask { Interval = 24 * 60, TypeName = typeof(RefreshArtistCommand).FullName },
                new ScheduledTask { Interval = 24 * 60, TypeName = typeof(RescanFoldersCommand).FullName },
                new ScheduledTask { Interval = 24 * 60, TypeName = typeof(HousekeepingCommand).FullName },

==

I'm back to a borked instance after two weeks. Same issues as before: -- The Rescan Folders task runs forever -- Each docker restart, and each scheduled time both add a new instance of the task to the queue -- Thus, the number of scheduled Rescan Folders is grows larger with each day -- The 'x' in the queue does not delete old tasks

On a very rare occasion I can delete all the 'x' tasks and restart the docker

Please consider a small patch to Lidarr to change the default folder scan to at least weekly, if not monthly.

RandomNinjaAtk commented 1 year ago

The issue your experiencing is a problem with Lidarr. I’m not a software developer and this project will not be making changes to Lidarr itself. If you want lidarr changed, you’ll need to request it in Lidarr support channels or write a PR for it to the upstream Lidarr project.

Lidarr here should behave no differently than it currently does in the upstream container…

versatilist commented 1 year ago

I respect your position. Understandably, neither of us want to debug or change source of Lidarr.

Please consider either:

Unfortunately my issue may be tied to architectural changes with lidarr when moving to v8. In particular, the Unmapped Files of the many albums I have that are not yet in musicbrainz. Hopefully the database change fixes the issue for others in this thread.

RandomNinjaAtk commented 1 year ago

Your issue is with lidarr, so having this in a separate container won't help, because the issue is still with lidarr. If you want the Lidarr team to look at the issue, you need to go through the correct support channels and not here....

RandomNinjaAtk commented 1 year ago

The conversation here is un-productive at this point and going in circles. Lidarr is an unmodified application in this container and suggested mitigations would be to disable all optional post-processing scripts. (Lidarr Web App -> Settings -> Connect).

No evidence has been presented to prove the additional scripts in the container are causing the problem.

If you want changes made to Lidarr itself, you need to use Lidarr upstream support channels to address your particular issue. Link: https://github.com/Lidarr/Lidarr

RandomNinjaAtk commented 1 year ago

@versatilist @steve1977 @Bait-Fish @bmatheny

I noticed a new setting recently in Lidarr Settings -> MediaManagement -> File Management :: "Watch Root Folders for file changes"

By default it appears to be enabled, maybe disable it and see if it improves the situation...

Edit: Also changing this setting may help: "Rescan Artist Folder after Refresh" :: always -> after manual refresh or never...

Bait-Fish commented 1 year ago

So far today with the default you mention checked, rescanning has occurred in 1 hour chunks then finishing. And this has accumulated to 10 hours of rescanning over the last 12 hours. This left only two hours of active script time.

Just made the suggested change and will monitor and see if it helps. Thanks again for your work, and support.

RandomNinjaAtk commented 1 year ago

I might be able to come up with a workaround that would only ignore the rescan task to allow the script to continue. All other tasks would still cause it to pause though. Which might be a good compromise.

I would still like to know what the root cause is, but not sure we'll find that out...