Open BlackHoleFox opened 3 months ago
Can you disable the watcher in server settings as a test to see if you get duplicates on any episodes downloaded after that?
Turned it off and restarted audiobookshelf's container. I'll cleanup the duplicates in the two library items I mentioned and see if they come back.
Update: It seems to be the library scanner causing these? It runs periodically which would explain why they show up at a delay. It autoran in the last hour or two and the duplicates showed up. I cleaned out the duplicates from both podcasts once more and manually started a scan from Settings
--> Libraries
--> Scan
. The scan finished and told me some items updated:
And then they showed up again in the podcast's library page. All the duplicated ones also showed up in the Newest Episodes
/recently added row. The actual newest episode is one to the right off screen:
I can reliably reproduce this now with the steps above.
Can you share some about how you are mapping volumes in docker and what file system/OS you are using?
Yeah, here you are:
ext4
but see below for the rest.My audiobookshelf data is stored between two places: The metadata and config for it is stored on the host but the podcasts themselves are stored elsewhere (for size and reliability reasons) and then mounted via SMB on the host running the container. From there, the podcasts folder is bind mounted into Docker so audiobookshelf can see it. Here's the relevant part of my Dockercompose config:
volumes:
- /home/user/deployment-data/audiobookshelf/audiobooks:/audiobooks:rw
- /mnt/media/podcasts:/podcasts:rw
- /home/user/deployment-data/audiobookshelf/metadata:/metadata
- /home/user/deployment-data/audiobookshelf/config:/config
@advplyr Hiyo I'm back with with more info after more digging around. I started under the hypothesis this might have been something specific to my SMB share setup but after reviewing all the audiobookshelf logs with debug logging turned on, I saw nothing pointing at the storage location causing the weird broken episode duplicates.
Starting this off, I double/extra/etc checked there aren't duplicates in the RSS feed:
And now: Here's the debug and scanner logs when initiating a scan after removing 1 of the duplicate podcast episodes from the UI. Its a little odd that it says updated here but the scanner log says "new podcast episode".
[LibraryScanner] Library scan e1099d4c-caca-4ea5-a6aa-6508b2343855 completed in 0:07.0 | 0 Added I
1 Updated | 0 Missing
[ApiCacheManager]library.afterUpdate: Clearing cache
[LibraryScan] Scan log saved "/metadata/logs/scans/2024-04-28_e1099d4c-caca-4ea5-a6aa-6508b2343855.txt"
[LibraryController] Scan complete
And then the relevant part of the scanlog. The rest of it is just saying every other podcast is up to date:
{"timestamp":"2024-04-28T04:35:57.662Z","message":"New library file found with path \"/podcasts/shared/A More Civilized Age - A Star Wars Podcast/82 - Always Two There Are and Brothers of the Broken Horn (Rebels 20 - 21).mp3\" for library item \"A More Civilized Age - A Star Wars Podcast\"","levelName":"INFO","level":2}
{"timestamp":"2024-04-28T04:35:57.663Z","message":"Library item \"A More Civilized Age - A Star Wars Podcast\" changed: [size,lastScan]","levelName":"DEBUG","level":1}
{"timestamp":"2024-04-28T04:35:58.246Z","message":"Mapping metadata to key tagComment => description: "TRIMMED","levelName":"DEBUG","level":1}
{"timestamp":"2024-04-28T04:35:58.249Z","message":"Mapping metadata to key tagSubtitle => subtitle: TRIMMED","levelName":"DEBUG","level":1}
{"timestamp":"2024-04-28T04:35:58.251Z","message":"Mapping metadata to key tagDate => pubDate: Wed, 20 Mar 2024 10:00:00 +0000","levelName":"DEBUG","level":1}
{"timestamp":"2024-04-28T04:35:58.252Z","message":"Mapping metadata to key tagTitle => title: 82: Always Two There Are and Brothers of the Broken Horn (Rebels 20 - 21)","levelName":"DEBUG","level":1}
{"timestamp":"2024-04-28T04:35:58.253Z","message":"New Podcast episode \"82: Always Two There Are and Brothers of the Broken Horn (Rebels 20 - 21)\" added","levelName":"INFO","level":2}
{"timestamp":"2024-04-28T04:35:58.298Z","message":"Mapping metadata to key tagAlbum => title: A More Civilized Age: A Star Wars Podcast","levelName":"DEBUG","level":1}
{"timestamp":"2024-04-28T04:35:58.299Z","message":"Mapping metadata to key tagArtist => author: A More Civilized Age","levelName":"DEBUG","level":1}
{"timestamp":"2024-04-28T04:35:58.301Z","message":"Mapping metadata to key tagGenre => genres: TV & Film:After Shows, Leisure:Animation & Manga","levelName":"DEBUG","level":1}
{"timestamp":"2024-04-28T04:35:58.302Z","message":"Mapping metadata to key tagLanguage => language: en","levelName":"DEBUG","level":1}
{"timestamp":"2024-04-28T04:35:58.303Z","message":"Mapping metadata to key tagPodcastType => podcastType: episodic","levelName":"DEBUG","level":1}
{"timestamp":"2024-04-28T04:35:58.311Z","message":"Found metadata file \"/metadata/items/51b244e1-bfd2-4e50-8bc3-a9e81954d099/metadata.json\"","levelName":"INFO","level":2}
{"timestamp":"2024-04-28T04:35:58.407Z","message":"Library item \"Scanline Media Patron Podcasts\" is up-to-date","levelName":"DEBUG","level":1}
When I remove one of these broken episodes, nothing indicates a problem:
[LibraryItem] Library item "51b244e1-bfd2-4e50-8bc3-a9e81954d099" updated
Looking at the all episode list, the normal and broken episode appear (stealth strike
). The second goes away temporarily when I delete it from the UI. But it always returns after a library scan.
So with not much progress on the surface the database got cloned to my computer and opened as well :) Despite deleting broken episodes in the UI, they still had a database entry, which looks very wrong compared to the correct one:
There's a bunch of "haunted" items like this in my library throughout multiple podcasts, and those all have broken duplicate database entries:
The interesting/important thing is that when I deleted the broken record from the podcastEpisodes
table, pushed the modified database to my server, and started the docker container it properly scrubbed the haunted episode away. When I scan the library, everything reports up-to-date and the duplicate doesn't reappear. Checking the RSS feed for new episodes doesn't find any junk either.
Pretty puzzled how they got there, honestly. Nothing weird or dangerous has been done to the database or filesystem. Maybe its the result of broken RSS feeds churning and leaving remnants in the database if the creators re-uploaded etc?
So that was a lot, but I'm interested in what you think of the broken database rows as someone who actually knows what everything should look like. Maybe a possible fix could be improving the scanner to look for any rows with a NULL
index (and maybe no enclosureSize
+ enclosureType
as well) and deleting them as a cleanup task? I could also manually scrub the database once but honestly these might eventually trickle back in, so an automatic purger could be better?
The enclosure fields are only populated if the episode was downloaded from an RSS feed. Not all podcast episodes need to come from an RSS feed, you may have audio files in your file system that are a podcast but don't have an associated RSS feed. So we wouldn't want to remove them automatically from the db like you are suggesting.
That is what the duplicates you are seeing are. They are episodes that were scanned in from the file system and were not matched with an existing episode while being scanned in so a new episode was created.
Still more information is needed to figure out what is going on. If you can enable Debug logs in the Logs page of settings this will provide more information during the scan. Then re-create the issue and we should see where both episodes are being created, one will be from the scanner while the other will be getting created during the RSS feed download.
Also if you can be on the latest version v2.9.0
So we wouldn't want to remove them automatically from the db like you are suggesting.
Got it 👍. Does the same train of thought apply to the NULL
index field observation as well?
They are episodes that were scanned in from the file system and were not matched with an existing episode while being scanned in so a new episode was created.
Sounds pretty strange, honestly. I don't manually edit the filesystem Audiobookshelf is working with and get everything from RSS. Could filesystem metadata changes (mtime
, ACLs) etc break this (and only for a specific set of episodes)? There also have never been duplicate files for these episodes on disk. It points to the same exact MP3 file the real episode does and is why I can't check the "hard delete" option for these ghost episodes in the UI or it will delete the source file leaving the true episode broken.
If you can enable Debug logs in the Logs page of settings this will provide more information during the scan.
All the logs I shared above were with Debug
logging enabled :rip: There wasn't any more detail available from the scan which "brought back" a deleted ghost episode.
Then re-create the issue and we should see where both episodes are being created, one will be from the scanner while the other will be getting created during the RSS feed download.
Yeah I will try, but I still don't know what actually causes these to be created. They have just randomly appeared so far.
Also if you can be on the latest version v2.9.0
I updated a few days ago, so all the logs above come from 2.9.0 :) Appreciate the help so far though.
Describe the issue
Audiobookshelf seems to be generating, or somehow finding, duplicate episodes of multiple podcasts I have added and listen to. This is similar to https://github.com/advplyr/audiobookshelf/issues/2122 but doesn't create duplicate files on disk and happens in versions after that bugfix.
Weirdly this doesn't happen with every single podcast in my library, just a few. The ones I notice the most are:
Nothing jumps out in my logs either, though let me know if I should enable debug logs for a week or so to help.
I've deleted duplicates from the library multiple times at this point, but they just seem to be coming back. For example AMCA's library entry has every new episode duplicated right now:
One of each is not linked to an RSS episode:
And there are no duplicates on disk. They're purely database/UI located. They show up duplicated in both the web UI and Android app:
Steps to reproduce the issue
Schedule Automatic Episode Downloads
. I have it running every day at midnight.Audiobookshelf version
2.8.0
How are you running audiobookshelf?
Docker