ZeroQI / Absolute-Series-Scanner

Seasons, absolute mode, Subfolders...
1.01k stars 155 forks source link

Youtube - Generate episode number by using file date #343

Closed reddragonguy closed 3 years ago

reddragonguy commented 3 years ago

Is your feature request related to a problem? Please describe. Update the code for youtube, and maybe others, to generate the episode id based on the file date

Describe the solution you'd like Here's the code i modified locally

### YouTube Channel ###
    files_per_date = []
    if source.startswith('youtube') and id.startswith('UC') or id.startswith('HC'):  # or not json_playlist and not json_full and source.startswith('youtube') and len(id)>2 and id[0:2] in ('PL', 'UU', 'FL', 'LP', 'RD')
      files_per_date = sorted([os.path.basename(file) for file in files], key=natural_sort_key if SW_YOUTUBE_DATE else getmtime) #to have latest ep first, add: ", reverse=True"
      for file in files_per_date:
        if extension(file) not in VIDEO_EXTS or os.path.isdir(os.path.join(root, path, file)):  continue  #only files with video extensions
        filename, folder_season = os.path.basename(file), 1  # sometime youtube-dl gets bad upload date and sets date to NA, mark these as season 0 (specials) so user notices and can fix if wanted

        if SW_YOUTUBE_DATE:
          match = re.match(r"([12]\d{3}[-. ]?(0[1-9]|1[0-2])[-. ]?(0[1-9]|[12]\d|3[01]))",filename)
          if match:  folder_season = int(filename[0:4])  # file starts with "yyyy-mm-dd" "yyyy.mm.dd" "yyyy mm dd" or "yyyymmdd", so take first four digits as the season year
          ep = files_per_date.index(filename)+1 if filename in files_per_date else 0
          Log.info(u'Youtube folder season date {}, season: {}, ep: {}, file: {}'.format("regex" if match else "error set season 0", folder_season, ep, filename))
        else:
          #folder_season = time.gmtime(os.path.getmtime(os.path.join(root, path, filename)))[0] if source=='youtube2' else folder_season or 1 # no info from file or flag not set, revert to original way of reading the file date
          filedate = time.gmtime(os.path.getmtime(os.path.join(root, path, filename)))
          folder_season = filedate[0]
          month = filedate[1]
          day = filedate[2]
          hour = filedate[3]
          minute = filedate[4]
          ep = month * 1000000 + day * 10000 + hour * 100 + minute
          Log.info(u'Youtube folder season gmtime,  season: {}, ep: {}, file: {}'.format(folder_season, ep, filename))
        add_episode_into_plex(media, file, root, path, folder_show if id in folder_show else folder_show+'['+id+']', int(folder_season if folder_season is not None else 1), ep, filename, folder_season, ep, 'YouTube', tvdb_mapping, unknown_series_length, offset_season, offset_episode, mappingList)
        continue
      return

Describe alternatives you've considered adding S**E*** manually or through another script

Additional context the reason this is needed that I tend to delete videos after i watch them and the current mechanism is based on number of files in the directory. This is a bit more consistent and also helps to reduce conflicts.

ZeroQI commented 3 years ago

Ideally the episode number would be fixed, but i am not sure Plex can handle reliably over time without database corruption episode numbers in that range... In that particular use case, I would add to watch later from YouTube and stream directly....

The current mechanism is:

If channel mode, we could attribute the season based on the year if youtube2 mode, the alternatives are:

Conflicts like remaining screenshots and metadata? That doesn't happen if you remove your files, "Delete Trash" in the library, then put new files and scan again...

Need to think about it for a bit

reddragonguy commented 3 years ago

I currently store videos in Channel mode. Videos are downloaded nightly. In Plex, i configure some channels to delete videos after they've been watched either the next day, week or on next sync. Since these are both automated, I wouldn't be able to consistently "Delete Trash" and worry about that.

With the current structure, every day a new video would show as Episode 1 . Because of this, the episode would show as already played and could eventually be deleted before I even see it.

I tried to work around things by using Playlist mode and instead of downloading into a channel folder, download into the "Uploads" playlist (channel id with UU prefix). I made a tweak to the API call and this worked for a while, but eventually i'd get duplicate episode ids if the channel tweaked the uploads in some way by maybe deleting videos.

I also have some channels that may upload multiple videos per day, so that's why i came up with the hack i did.

I like your suggestion with the DDDx format as i think history or rank are a little error prone for channel mode, but I thought finding x might not be worth it or more complicated based on my limited skill with python. Also, the id could change on a "reload" depending on how many videos of the same day were available.

I also have the issue that may have been previously reported that the poster for the channel is filled with episode images and the channel poster eventually changes to an episode image.

thanks for deciding to think about this. I chose the way i went because it didn't require any additional API calls and as long the video was saved with the upload date, which sometimes is the last modified date that could be addressed with the --no-mtime flag, then it's a highly consistent naming pattern. I wasn't aware of a possible DB corruption though due to the high number. Has that proven to be an issue in the past?

ZeroQI commented 3 years ago

Automated like that, it makes perfect sense...

I am not sure about the corruption, but that's not a scenario Plex would have tested in depth, and we are doing really uncharted territory with grouping folders in the scanner, so i try to stay cautious...

The cleanest to date is formatting the filename put the date with SW_YOUTUBE_DATE

We could just add per-date support for YouTube (the same way Plex supports date-based shows), and files at the same date are added separately!!! (tested it) and could

Scanner code principle:

            tv_show = Media.Episode(show, year, None, None, None)  #use year as season
            tv_show.released_at = '%d-%02d-%02d' % (year, month, day)
            tv_show.parts.append(i)
            mediaList.append(tv_show)

Agent code principle:

originally_available_at = parse_date(episode_info['firstAired'])
date_based_season = originally_available_at.year
if date_based_season in media.seasons and originally_available_at in media.seasons[date_based_season].episodes:
  media.seasons[date_based_season].episodes[originally_available_at].title = 'test'

What do you think? Using file date to label files by date for channels?

reddragonguy commented 3 years ago

i think there may be an issue. I use this output string to name the file: %(uploader)s [%(channel_id)s]/%(upload_date)s - %(title)s [%(id)s].%(ext)s.

the upload_date is outputted as YYYYMMDD so i don't know if that will work.

Also, it would be helpful to still see the episode title for those instances where some channels don't have an episode poster that shows a title or some other helpful pic about the video.

Hopefully, i'm not misunderstanding your suggestion, but i like the thought as long as original ep title can still be seen.

micahmo commented 3 years ago

Just popping in to follow this thread. :-) I tried the date-based naming scheme recommended by Plex doc and it seems to work nicely. I still see the episode title. It would require a different naming scheme, though (it looks for dashes, dots, or spaces).

Otherwise, no other comment. It would definitely be nice to reuse the functionality of Plex that handles date-based episodes. Agree that very high episode ids like ep = month * 1000000 + day * 10000 + hour * 100 + minute and other ways of trying to calculate an unique episode id could get tricky. :-)

reddragonguy commented 3 years ago

@micahmo this already works as i'm using it with no issues. There is a belief that this may fail in the future, but I don't think it would. Long-term use of this is obviously untested since i just made the change locally yesterday.

bcc32 commented 3 years ago

I have a branch where I set the episode number to the modified time (Unix timestamp) of the file. It seems to work okay, no issues so far with the large episode numbers, and no collisions :)

ZeroQI commented 3 years ago

I will code this for channels/no channel youtube videos:

This will avoid meaningless episode numbers, avoid API calls, ANd will look neat without marking videos as already watched when episode numbers are reused

ZeroQI commented 3 years ago

Updated scanner

Updated agent

Please review and report

reddragonguy commented 3 years ago

just tried the update. Everything seems to mainly work fine. I did discover one issue where a filename started with 20170122, but the file date was 2021-03-12 and the file ended up in season 2021 instead of 2017.

Also, may not be an issue for the scanner but for the agent but the following file name/channel doesn't work:

/media/youtube/ATHLEAN-X™ [UCe0TLA0EsQbE-MjuHXevj2A]/20210228 - How to Fix a Headache in 90 Seconds Flat! (JUST DO THIS) [Aom1uaythK8].mp4.

the channel or episodes don't display any data. I've looked in the logs but can't find anything. Can you rename a folder/file like above and have a look?

reddragonguy commented 3 years ago

okay, just had a new set of videos downloaded. For one channel, it originally had this video:

/media/youtube/Adult Swim [UCgPClNr5VSYC3syrDUIlzLw]/20210315 - LITHIUM 'Skate Park' _ adult swim smalls [TqR1hYzuAi0].mp4

now, a new video was added:

/media/youtube/Adult Swim [UCgPClNr5VSYC3syrDUIlzLw]/20210316 - FIRST LOOK - Birdgirl _ April 4 @ Midnight _ adult swim [vbZzDOgwdfg].mp4

so, here are the timestamps:

image

and this is the plex display. Both videos have the episode poster of the first episode, and the title and date of the second episode:

image

Maybe something else has to be added to Plex to ensure the video is unique

reddragonguy commented 3 years ago

here's the agent file i found that processed the new video youtube adult swim.txt

I think the time may need to be included to help with uniqueness: metadata.seasons[2021].episodes[2021-03-16] or maybe just use the youtube id. as long as videos sort correctly by date and we avoid duplicates, i think that's all that matters.

ZeroQI commented 3 years ago

I see one episode in logs Format for date in title not supported yet: 20210315 - xxx.ext

=============================================================================================================================================================
Call: "Plex", path: "Adult Swim [UCgPClNr5VSYC3syrDUIlzLw]", folder_show: "Adult Swim [UCgPClNr5VSYC3syrDUIlzLw]", dirs (0), files (2)
=============================================================================================================================================================
Forced ID (series folder) - source: "youtube", id: "UCgPClNr5VSYC3syrDUIlzLw"
"Adult Swim [UCgPClNr5VSYC3syrDUIlzLw]" s2021e 2021-3-17                         "YouTube file date" "C:\Users\benja\Videos\_Plex3\Adult Swim [UCgPClNr5VSYC3syrDUIlzLw]\20210315 - LITHIUM 'Skate Park' _ adult swim smalls [TqR1hYzuAi0].mp4" "20210315 - LITHIUM 'Skate Park' _ adult swim smalls [TqR1hYzuAi0].mp4"
"Adult Swim [UCgPClNr5VSYC3syrDUIlzLw]" s2021e 2021-3-17                         "YouTube file date" "C:\Users\benja\Videos\_Plex3\Adult Swim [UCgPClNr5VSYC3syrDUIlzLw]\20210316 - FIRST LOOK - Birdgirl _ April 4 @ Midnight _ adult swim [vbZzDOgwdfg].mp4" "20210316 - FIRST LOOK - Birdgirl _ April 4 @ Midnight _ adult swim [vbZzDOgwdfg].mp4"

Updated ASS date format, now use filename format BUT once the agent loads the same meta goes on both files After checking both files came out the same day so when the date updates, both episodes point to the same episode date, hence the same metadata despite it knowing there are 2 episodes... after changing and fixing the date in the GUI, meta updates fine

reddragonguy commented 3 years ago

in the screenshot I posted above, one video was from 3/16 and the second was from 3/15.

I'm assuming there's only one in log because the first video was already there. the new video was added and I'd assume wouldn't cause the agent to reprocess the previous video. but after the new video is sent to Plex, the agent updates it and those updates end up affecting the older video.

So, I'd suggest the following test:

ZeroQI commented 3 years ago

Because because episode.originally_available_at gets updated with the youtube date which is the same "2021-03-16T00:00:04Z" "2021-03-16T16:30:38Z"

I can prevent the agent from updating the date, which is ok IF the date added are different, but if file date is the same, all eps of that data gets the same meta, same if the meta update to matching date, and one ep is seen in the agent...

Seem like a plex issue with Date-based shows...

reddragonguy commented 3 years ago

ahh, interesting. ok, i didn't look at the API calls, just the dates on the file system.

If you have the time and the bandwidth, can you try to see if you can pass in a time with the date: 3/16/2021 16:00 and see if Plex handles that ok?

reddragonguy commented 3 years ago

I've tried changing ASS to include time and it didn't work. Thank you for the effort, but I'm going to change the code to calculate an episode id so that I can avoid any possible duplicates and confusing metadata per video.

ZeroQI commented 3 years ago

plex support for Date-based shows is the date only, and it does create different episodes for the same date, but hte agent part of plex then confuse them all together... I just added Date-Based shows format and didn't expect plex to add correctly but fail to update. The format doesn't allow for time. If my guess is correct, somebody with Plex pass adding Date-based shows whose eps are from the same date should have the same issues (looping through episodes show only the first with the same date)

Will try to add the Youtube ID to the date and see if this flies, and try to solve the plex metadata issue since the scanner part works, and if not, might have to revert code.

reddragonguy commented 3 years ago

i have plex pass

ZeroQI commented 3 years ago

tried and cannot support multiple files on same day due to Plex Agent meta handling, scanner part handling is Ok. Your issue would be is solved If you can find two episode released the same day for a series and label them as such using real media files and thetvdb and the bug is present, create an official ticket to Plex, and when it is resolved, this bug should be resolved too, or i can change the code to do so...

reddragonguy commented 3 years ago

i would humbly suggest that if the code can reduce the likelihood of false-bugs or not reveal Plex limitations, it should do so since it is not unusual for YouTube videos to be published on the same date. Therefore, I'd suggest keeping the date code for most people, but maybe allowing the option to generate episode ids to help guarantee uniqueness of videos. Maybe even trying to allow it on a per-channel basis through a config or marker file that can exist in the folder.

Either way, really appreciate what you've done so far and hopeful it will help others!! I'll just have to remember to do my "hack" for any new release.

ZeroQI commented 3 years ago

Channels date mode works as long as there isn't two files of the same date in last modification or included in the filename, in which case conflicting episodes share the same episode key and metadata, but can be viewed separately...

I could reuse youtube2/3 modes to now use generated ep number or take the rank in the channel items list but need an api call, but would allow to fill title for folders with mixed youtube videos... Need to think about it, the date-based seem good if no duplicate.

ZeroQI commented 3 years ago

Tried Plex date-based series and the TVDB agent can update the ep number afterwards, good to know...

Have to test if changing ep number from date based numbering actually make the ep number key unique again, in which case i can use the channel rank id in the agent and no youtube2/3 mode necessary any more... To be tested. We might be lucky...

ZeroQI commented 3 years ago

test release https://gist.github.com/ZeroQI/03e4dbb4f3805305adf8947d7e03c901 mapped youtube2 to your mode for now normal mode is date-base if present in filename or files on disk after test, agent can change ep number (to channel items?, at least thetvdb agent could give proper ep number) The other scnaner that handles YYYY-MM-DD_1 format does assign this format DDDI

    if 'episodeIndex' in match.groupdict() and match.group('episodeIndex') is not None:
        episodeIndex = int(match.group('episodeIndex').strip())
        log('setValues', 'episode contains index %s', episodeIndex)
    episodeIndexAsString = format(episodeIndex, '02')
    self.episodeNumber = int(dayOfYear + str(episodeIndexAsString))
    log('setValues', 'episode number %s', self.episodeNumber)

I was thinking no tvdb2 mode, but if multiple files have same date, add MMDDII so it is automatc:

@reddragonguy What do you think ?

reddragonguy commented 3 years ago

I've tested the following 6 scenarios with Looper which drops multiple videos on the same day:

image

So, they all have 20 files in them, but show very differently in Plex.

So, Looper18 and Looper21 are the libraries I want regardless of how i label the files. Looper17, Looper19, and Looper20 are out because it shows the same meta data and if i mark one video as played, the others change status too. Looper22 was an odd egg. It showed some videos with the same date with different titles and the same image and when I marked one as played it sometimes didn't update the others.

ZeroQI commented 3 years ago

which release did you run the tests on? https://gist.github.com/ZeroQI/03e4dbb4f3805305adf8947d7e03c901 or latest master code?

reddragonguy commented 3 years ago

I used the code from the gist

ZeroQI commented 3 years ago

i cleaned show variable in add episode function, had broken forced id Pushed to master fix, youtube2 mode [duplicate dates]) should work, youtube 3 mode no longer exist If you wait a day will have coded a fix with no nome required in any case

ZeroQI commented 3 years ago

Posted new version for test to master

reddragonguy commented 3 years ago

i've looked at the recent commits. I'm not sure it'll help with the following scenario which happens with Looper:

Wouldn't this cause the new video that is added with just date show as already played? I think it will still be helpful to allow episode ids to be generated based on full date/time of the video to help eliminates duplicates as much as possible as it is extremely unlikely that 2 videos uploaded to a channel could have the exact date and time.

ZeroQI commented 3 years ago

Test it, please, I lost 4 hours of sleep with this code... When there are duplicates it uses MMDDxx automatically where xx is an incremental value, it is based on your code. It does so by building a dict and the old ep filename is replaced with an index number when duplicates are there... When not needed it uses date based. It won't reuse episode number and multiple eps released on same day are handled.

        if not Dict(mapping, season, episode):  SaveDict(filename, mapping, season, episode)
        else:
          if isinstance(Dict(mapping, season, episode), int):  index = Dict(mapping, season, episode) + 1
          else:  
            index = 1
            SaveDict(Dict(mapping, season, episode), mapping, season, episode+'01')  #save filename under duplicate episode naming convention            
            Log.info('- Moving 1st duplicate season: {}, episode: {}'.format(season, episode))
          SaveDict(index,    mapping, season, episode)
          SaveDict(filename, mapping, season, episode + '{:02d}'.format(index))

      for season in mapping or []:  #to have latest ep first, add: ", reverse=True"
        for episode in mapping[season] or []:
          filename = Dict(mapping, season, episode)
          if not isinstance(filename, int):
            add_episode_into_plex(media, filename, root, path, folder_show if not id or id in folder_show else folder_show+'['+id+']', int(season), episode, os.path.basename(filename), season, "", "Youtube Date", tvdb_mapping, unknown_series_length, offset_season, offset_episode, mappingList)
      return
reddragonguy commented 3 years ago

Things aren't working quite as expected. Here are the scanner files for Looper: looper.filelist.log looper.scanner.log

So, i have 21 files, and Plex only shows 16 episodes and 15 are marked as unplayed and some of the new ones don't get their metadata updated as the episode titles are just the filenames.

image

I really do think you are going to be better off by ALWAYS generating the episode id in order to be consistent.

ZeroQI commented 3 years ago

It is simpler, but i can't stand the high number for episodes, and i wanted date-based to cover all angles Now corrected, a '1' instead of a '2' caused it, when a duplicate exist, we reuse the duplicationg date as index, and filename as entry 1, therefore we set current looping entry as index 2...