Closed MizeryMade closed 1 year ago
Old YOUTUBE_VIDEO_REGEX = Regex('\[(?:youtube\-)?(?P<id>[a-z0-9\-_]{11})\]', Regex.IGNORECASE)
New YOUTUBE_VIDEO_REGEX = Regex('[\[_](?:youtube\-)?(?P<id>[a-z0-9\-_]{11})[\]_]', Regex.IGNORECASE)
New regex output gives videoId [vp9-142kbit] which is wrong, should be videoId [oyWJXX1CZiE]
File: A Deep Dive {1080p_30fps_vp9-142kbit_opus} ~ [oyWJXX1CZiE].mkv
It's the underscore that mess it up, but they are supposed to be at the beginning and the end...
https://github.com/ZeroQI/YouTube-Agent.bundle/blob/6be09f20fcede15fdf7a6c228e442433a299e938/Contents/Code/__init__.py#L574
Replace line 574 with the below and report please as i do not use this agent (despite being the main coder) and correct blind:
YOUTUBE_VIDEO_REGEX = Regex('^[\[_](?:youtube\-)?(?P<id>[a-z0-9\-_]{11})[\]_]$', Regex.IGNORECASE)
That change results in it being unable to find a VideoID in the filename.
2023-01-16 17:05:21,092 (44a0) : INFO (logkit:16) - populate_episode_metadata_from_api() - filename: {Clout Cancun} ~ 2022-12-13 ~ ~ When Rap & Wrestling Collide: A Deep Dive {1080p_30fps_vp9-142kbit_opus} ~ [oyWJXX1CZiE].mkv 2023-01-16 17:05:21,092 (44a0) : INFO (logkit:16) - videoId not found in filename
Didn't dig too far into the project, but it look like TubeArchivist is hardcoded to make YT-DLP download the files with the date and VideoID at the beginning of the filename such as "20221213oyWJXX1CZiE", so the change to work for that will pick out any string of 11 characters sandwiched between 2 underscores, whereas your script has expected the ID to be contained in square brackets.
Granted, my inclusion of the encoder and other information isn't exactly standard compared to your docs, but "--restrict-filenames" is suggected which if I recall will replace spaces with underscores in the filename which could lead to others having issues. Not really sure that there is a way to support your established requirement of having the ID in brackets, while also supporting TubeArchivist's approach of just sandwiching it between underscores?
I am really bad at regex, and don't use the agent myself so tricky to reproduce...
https://regex101.com/r/BFKkGc/3/
_vp9-142kbit_ is exactly 11 characters long between underscores, that is why it fails...
This may work: (set beginning and end, and a variable unnamed group to steal the rest in front)
YOUTUBE_VIDEOREGEX = Regex('^(\w+?)[[](?:youtube-)?(?P
After doing some tinkering, I think I've found a simple and working solution:
(?:^\d{8}_|\[(?:youtube\-)?)(?P<id>[a-z0-9\-_]{11})(?:\]|_)
Haven't done extensive testing with it, but it does appear to be working with a couple of variations of the file used in my original example:
INFO (logkit:16) - populate_episode_metadata_from_api() - filename: 20221213oyWJXX1CZiE{Clout Cancun} ~ 2022-12-13 ~ ~ When Rap & Wrestling Collide: A Deep Dive {1080p_30fps_vp9-142kbit_opus} - Copy.mkv INFO (logkit:16) - # videoId [oyWJXX1CZiE] not in Playlist/channel item list so loading json_video_details [...] INFO (logkit:16) - populate_episode_metadata_from_api() - filename: 20221213oyWJXX1CZiE{Clout Cancun} ~ 2022-12-13 ~ ~ When Rap & Wrestling Collide: A Deep Dive {1080p_30fps_vp9-142kbit_opus} ~ [oyWJXX1CZiE].mkv INFO (logkit:16) - # videoId [oyWJXX1CZiE] not in Playlist/channel item list so loading json_video_details [...] INFO (logkit:16) - populate_episode_metadata_from_api() - filename: {Clout Cancun} ~ 2022-12-13 ~ ~ When Rap & Wrestling Collide: A Deep Dive {1080p_30fps_vp9-142kbit_opus} ~ [oyWJXX1CZiE].mkv INFO (logkit:16) - # videoId [oyWJXX1CZiE] not in Playlist/channel item list so loading json_video_details
Could certainly be some cases that break the RegEx, but again with my limited tinkering/testing it seems to be performing as expected with both naming conventions.
It seems the change for "Make YouTube regex compatible with TubeArchivist" has broken the functionality for cases were the script was previously functioning fine. Seems maybe the RegEx needs another look?
CURRENT:
REVERTED: