ZeroQI / YouTube-Agent.bundle

Plex Metadata Agent for Movies and TV Series libraries
446 stars 44 forks source link

Last version crash when underscores are present #123

Closed MizeryMade closed 1 year ago

MizeryMade commented 1 year ago

It seems the change for "Make YouTube regex compatible with TubeArchivist" has broken the functionality for cases were the script was previously functioning fine. Seems maybe the RegEx needs another look?

CURRENT:

2023-01-16 11:59:03,984 (25f8) : INFO (logkit:16) - populate_episode_metadata_from_api() - filename: {Clout Cancun} ~ 2022-12-13 ~ ~ When Rap & Wrestling Collide: A Deep Dive {1080p_30fps_vp9-142kbit_opus} ~ [oyWJXX1CZiE].mkv 2023-01-16 11:59:03,984 (25f8) : INFO (logkit:16) - # videoId [vp9-142kbit] not in Playlist/channel item list so loading json_video_details 2023-01-16 11:59:04,025 (25f8) : DEBUG (networking:138) - Fetching 'https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails,statistics&id=vp9-142kbit&key={KEY}' from the HTTP cache 2023-01-16 11:59:04,072 (25f8) : INFO (logkit:16) - Error: "list index out of range" 2023-01-16 11:59:04,072 (25f8) : INFO (logkit:16) - [ ] genres: "[]" 2023-01-16 11:59:04,072 (25f8) : INFO (logkit:16) - === End Of Agent Call, errors after that are Plex related ===

REVERTED:

2023-01-16 11:59:54,108 (41ac) : INFO (logkit:16) - populate_episode_metadata_from_api() - filename: {Clout Cancun} ~ 2022-12-13 ~ ~ When Rap & Wrestling Collide: A Deep Dive {1080p_30fps_vp9-142kbit_opus} ~ [oyWJXX1CZiE].mkv 2023-01-16 11:59:54,108 (41ac) : INFO (logkit:16) - # videoId [oyWJXX1CZiE] not in Playlist/channel item list so loading json_video_details 2023-01-16 11:59:54,151 (41ac) : DEBUG (networking:138) - Fetching 'https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails,statistics&id=oyWJXX1CZiE&key={KEY}' from the HTTP cache [...] 2023-01-16 11:59:54,938 (41ac) : INFO (logkit:16) - === End Of Agent Call, errors after that are Plex related ===

ZeroQI commented 1 year ago
Old YOUTUBE_VIDEO_REGEX = Regex('\[(?:youtube\-)?(?P<id>[a-z0-9\-_]{11})\]', Regex.IGNORECASE)
New YOUTUBE_VIDEO_REGEX = Regex('[\[_](?:youtube\-)?(?P<id>[a-z0-9\-_]{11})[\]_]', Regex.IGNORECASE)

New regex output gives videoId [vp9-142kbit] which is wrong, should be videoId [oyWJXX1CZiE] File: A Deep Dive {1080p_30fps_vp9-142kbit_opus} ~ [oyWJXX1CZiE].mkv It's the underscore that mess it up, but they are supposed to be at the beginning and the end...

https://github.com/ZeroQI/YouTube-Agent.bundle/blob/6be09f20fcede15fdf7a6c228e442433a299e938/Contents/Code/__init__.py#L574 Replace line 574 with the below and report please as i do not use this agent (despite being the main coder) and correct blind: YOUTUBE_VIDEO_REGEX = Regex('^[\[_](?:youtube\-)?(?P<id>[a-z0-9\-_]{11})[\]_]$', Regex.IGNORECASE)

MizeryMade commented 1 year ago

That change results in it being unable to find a VideoID in the filename.

2023-01-16 17:05:21,092 (44a0) : INFO (logkit:16) - populate_episode_metadata_from_api() - filename: {Clout Cancun} ~ 2022-12-13 ~ ~ When Rap & Wrestling Collide: A Deep Dive {1080p_30fps_vp9-142kbit_opus} ~ [oyWJXX1CZiE].mkv 2023-01-16 17:05:21,092 (44a0) : INFO (logkit:16) - videoId not found in filename

Didn't dig too far into the project, but it look like TubeArchivist is hardcoded to make YT-DLP download the files with the date and VideoID at the beginning of the filename such as "20221213oyWJXX1CZiE", so the change to work for that will pick out any string of 11 characters sandwiched between 2 underscores, whereas your script has expected the ID to be contained in square brackets.

Granted, my inclusion of the encoder and other information isn't exactly standard compared to your docs, but "--restrict-filenames" is suggected which if I recall will replace spaces with underscores in the filename which could lead to others having issues. Not really sure that there is a way to support your established requirement of having the ID in brackets, while also supporting TubeArchivist's approach of just sandwiching it between underscores?

ZeroQI commented 1 year ago

I am really bad at regex, and don't use the agent myself so tricky to reproduce...

https://regex101.com/r/BFKkGc/3/

_vp9-142kbit_ is exactly 11 characters long between underscores, that is why it fails...

This may work: (set beginning and end, and a variable unnamed group to steal the rest in front) YOUTUBE_VIDEOREGEX = Regex('^(\w+?)[[](?:youtube-)?(?P[a-z0-9-]{11})[]]$', Regex.IGNORECASE)

MizeryMade commented 1 year ago

After doing some tinkering, I think I've found a simple and working solution:

(?:^\d{8}_|\[(?:youtube\-)?)(?P<id>[a-z0-9\-_]{11})(?:\]|_)

Haven't done extensive testing with it, but it does appear to be working with a couple of variations of the file used in my original example:

INFO (logkit:16) - populate_episode_metadata_from_api() - filename: 20221213oyWJXX1CZiE{Clout Cancun} ~ 2022-12-13 ~ ~ When Rap & Wrestling Collide: A Deep Dive {1080p_30fps_vp9-142kbit_opus} - Copy.mkv INFO (logkit:16) - # videoId [oyWJXX1CZiE] not in Playlist/channel item list so loading json_video_details [...] INFO (logkit:16) - populate_episode_metadata_from_api() - filename: 20221213oyWJXX1CZiE{Clout Cancun} ~ 2022-12-13 ~ ~ When Rap & Wrestling Collide: A Deep Dive {1080p_30fps_vp9-142kbit_opus} ~ [oyWJXX1CZiE].mkv INFO (logkit:16) - # videoId [oyWJXX1CZiE] not in Playlist/channel item list so loading json_video_details [...] INFO (logkit:16) - populate_episode_metadata_from_api() - filename: {Clout Cancun} ~ 2022-12-13 ~ ~ When Rap & Wrestling Collide: A Deep Dive {1080p_30fps_vp9-142kbit_opus} ~ [oyWJXX1CZiE].mkv INFO (logkit:16) - # videoId [oyWJXX1CZiE] not in Playlist/channel item list so loading json_video_details

Could certainly be some cases that break the RegEx, but again with my limited tinkering/testing it seems to be performing as expected with both naming conventions.