ZeroQI / Absolute-Series-Scanner

Seasons, absolute mode, Subfolders...
1.01k stars 155 forks source link

Reverse / sort playlist order by published date #341

Closed micahmo closed 3 years ago

micahmo commented 3 years ago

Hi @ZeroQI,

I know there has already been some talk of reversing/sorting playlists, most notable in #326. Neither of the two main suggested solutions, SW_YOUTUBE_DATE nor youtube2 really solve the problem, as far as I can tell.

Originally, I thought I just wanted to reverse the order of the items in the playlist, but as you mentioned in #326, it gets tricky if the underlying playlist order ever changes. Then I realized what I really want is a way to sort the videos in the playlist by published date. (And I don't want to create a season per year.)

I decided to experiment with your idea of adding a new youtube3 source type, which can sort the items in the playlist by published date. Here's a snippet of the change.

sortByPublishedDate = source.startswith('youtube3')
. . .
for rank, video in enumerate( sorted(Dict(json_full, 'items') or {}, key=lambda item:item['contentDetails']['videoPublishedAt']) if sortByPublishedDate else Dict(json_full, 'items') or {}, start=1):
   . . .

I found that this works perfectly for my use case, so I forked and committed it for reference. See the full change here.

micahmo/Absolute-Series-Scanner@d151074

I can open a PR if it's something you want to add and/or if it fits with the structure of your project, but I totally understand if you want to add it a different way, or don't want to add this feature at all.

P.S. I had to add the playlist ID to the folder again/separately so that that the YouTube-Agent would successfully populate the playlist name: [youtube3-{playlistid}][{playlistid}]

ZeroQI commented 3 years ago

@micahmo

https://github.com/ZeroQI/Absolute-Series-Scanner/blob/master/Scanners/Series/Absolute%20Series%20Scanner.py

I would like playlist episode to have its number based on the playlist rank ideally for it to be static, but some playlists are arranged to have the latest video first, and although it works for Youtube kinda, it is wrong from a metadata standpoint and i would like to show the oldest video first, but i don't want to create more Youtube API calls... I therefore plan to modify the Playlist loading with your files

YOUTUBE_PLAYLIST_ITEMS = 'https://www.googleapis.com/youtube/v3/playlistItems?part=snippet,contentDetails&maxResults=50&playlistId={}&key='+API_KEY
for rank, video in enumerate( sorted(Dict(json_full, 'items') or {}, key=lambda item:item['contentDetails']['videoPublishedAt']), start=1):

Can you give me the playlist id and your use case, in other words, why you want it reversed ? Do you use local json files btw?

micahmo commented 3 years ago

@ZeroQI

lines 823-857 YouTube Playlist ID: use playlist file (one each scan if in a grouping folder so recommend playlist folder at library root)

Yes, this is the area of the code that I tweaked for my use case. The way it works now (if I understand it correctly) is that it just iterates through the items in the playlist, exactly as they're returned by the API.

lines 858-860 YouTube Channels ID: sort by file system date, or date within filename if SW_YOUTUBE_DATE is set line 148

Unfortunately, I'm wanting to sort a playlist, not a channel. I also don't want seasons per year. Lastly, file system date can be set incorrectly by youtube-dl (I think it uses the added to playlist date instead of the publishing date). All the more reason to use contentDetails.videoPublishedAt. :-)

I would like playlist episode to have its number based on the playlist rank ideally for it to be static, but some playlists are arranged to have the latest video first, and although it works for Youtube kinda, it is wrong from a metadata standpoint and i would like to show the oldest video first, but i don't want to create more Youtube API calls... I therefore plan to modify the Playlist loading with your files

Am I understanding correctly that you want to sort on videoPublishedAt always? It's up to you if you want to make that change, but I thought perhaps it could be optional. Maybe there's a time when someone does want the existing YouTube playlist order, even if it's not chronological. Not sure... Either way, the nice thing about using date is that it is static -- at least in terms of relative order... I guess a video could be removed from the playlist and mess up the order. :-)

BTW, I'm totally agreed about not making YouTube API calls, but I think the sorting could be optional without any additional calls. As long as contentDetails is included in the query string, you have the information to sort chronologically, or not.

Can you give me the playlist id and your use case, in other words, why you want it reversed ?

Yep, here you go. https://www.youtube.com/playlist?list=PLaDrN74SfdT5jEs3RCI53nBUkuURSWrhq Notice that the first video in the playlist is "Episode 25" and the last video is "Episode 1". Sorting this chronologically not only makes sense, but it keeps things manageable when they add a new video in the number 1 position.

Do you use local json files btw?

No I don't. I'm currently using TubeSync for downloading, which, while very cool, has limited options that can be passed to youtube-dl. It can write .nfo, but not .info.json. It can embed things like dates in the filename, if that's helpful...

ZeroQI commented 3 years ago

To reduce API call, no grouping folders is recommended since it scan folder every scan, unless local json files are present, in which case no API call is done

A setting in code would be global, it needs to be per playlist, so sticking to new [youtubex-xxxx] modes would make sense, but would translate back to [youtube-xxxx] mode after use as the agent work by video ID so doesn't need to know, like anidb2 changing into tvdb after scan for the agent...

Sorting possible for playlists:

Playlist (default) and playlist reverse (we do need that for playlist in reverse unless i can detect easily) best in my opinion and 'videoPublishedAt' Could be useful in times of unsorted chronologically playlist (music video ?)

Channel are sorted per year by default (thought it was youtube2 mode but actually, not in ASS nor YouTube agent code) Still thining about best way forward

micahmo commented 3 years ago

A setting in code would be global, it needs to be per playlist, so sticking to new [youtubex-xxxx] modes would make sense, but would translate back to [youtube-xxxx] mode after use as the agent work by video ID so doesn't need to know, like anidb2 changing into tvdb after scan for the agent...

Yes, totally agreed. "Setting" is not the best word, but some way to indicate (outside of the code) how a playlist should be sorted. That was my idea with my sample implementation of youtube3-xxx that sorts chronologically...

As for the possible sorting methods, I'd vote for videoPublishedAt or date in filename, as long as it can be chronological. I'd guess that most people who ask for reversed playlist (like #326) are asking because the playlist is not chronological on YouTube. But I'm purely guessing and there may be a very good use for reverse.

Still thining about best way forward

Thank you, I really appreciate you taking the time to consider this case! Whatever you decide to implement will be great, I'm sure. :-)

ZeroQI commented 3 years ago

We could have used the API to reverse, but only work for the playlist owner so no go:

https://developers.google.com/youtube/v3/docs/search/list Order: date – Resources are sorted in reverse chronological order based on the date they were created. Foor your OWN playlist of course...

Use case for YouTube library forced ID is Playlist order but if last items are first, reverse it, preferably detected automatically to avoid the need for a further mode selection [youtube3-xxx]

Playlistitems format:

Concept

totalResults = Dict(json, "pageInfo", "totalResults")
reverse = Datetime.ParseDate(Dict( Dict(json, "items")[0], "snippet", "publishedAt" )).date() >  Datetime.ParseDate(Dict(Dict(json, "items")[-1], "snippet", "publishedAt" )).date()
ep number = totalResults - rank if reverse else rank+1

What do you think? I was going for 'Inspired'

Couldn't download json initially apparently my key server but 500k requests a day but not enough for me to display a single JSON at times :(... [Please create and use your own YouTube API key to avoid issues]. Weird, i don't remember many donations (i do remember one particularily though) :p

image

micahmo commented 3 years ago

We could have used the API to reverse, but only work for the playlist owner so no go:

Ahh, that's too bad! It would be nice to use the API directly for this.

Concept . . . What do you think? I was going for 'Inspired'

I like it! I tested with some of the playlists that I am interested in, and it seems to work very well for all of them.

If you're interested, here's the code I used to test.

import json
from urllib.request import urlopen
import plac

@plac.pos('apikey', help="YouTube API Key")
@plac.pos('playlistId', help="Playlist ID")
def main(apiKey, playlistId):
    PLAYLISTAPIURL = 'https://www.googleapis.com/youtube/v3/playlistItems?part=snippet,contentDetails&maxResults=100&playlistId={}&key={}'
    request = PLAYLISTAPIURL.format(playlistId, apiKey)
    res = json.loads(urlopen(request).read())
    json_full = res['items']

    while "nextPageToken" in res:
        res = json.loads(urlopen(f"{request}&pageToken={res['nextPageToken']}").read())
        json_full.extend(res['items'])

    totalResults = res["pageInfo"]["totalResults"]
    reverse = json_full[0]["snippet"]["publishedAt"] > json_full[-1]["snippet"]["publishedAt"]

    print(f"reversed is {reverse}")

    for rank, vid in enumerate(json_full):
        epNumber = totalResults - rank if reverse else rank+1
        print(f"Episode {epNumber} ----- {vid['snippet']['title']}")

if __name__ == '__main__':
    plac.call(main)

P.S. I still plan on keeping my fork so that I can use absolute/chronological with undocumented [youtube3-xxx]. But I have a feeling that your solution of reversing if last item was added to playlist before first should solve 99% of cases. :-)

ZeroQI commented 3 years ago

Excellent!

The point of downloading YouTube vids is in case they are removed or account deleted, so people should download json files The playlist reversing is very simple, am glad you could add so simply support, will incorporate in a new release soon while grouping all YouTube code in one place

Channel mode:

Library mode:

Both

micahmo commented 3 years ago

That all sounds great! Thanks again for the support and the awesome project.

ZeroQI commented 3 years ago

corrected youtube2 for channels created all modes described Tested but then added youtube3 without tests https://gist.github.com/ZeroQI/03e4dbb4f3805305adf8947d7e03c901

micahmo commented 3 years ago

Awesome! I downloaded and tested.

  1. Tried on "normal" playlist that is already chronological on YT. Still looks good.
  2. Tried on playlist that is "backwards" on YouTube. Now it's in correct/reversed order without any special modes needed. Awesome!
  3. Tried [youtube3-xxx] with a playlist that is neither chronological nor reverse chronological. (It is [newest], [oldest], [middle].) There was a small problem, because youtube3 was not yet added as a valid source. I made the changes to SOURCE_IDS and SOURCE_ID_FILES, and then it was understood as a valid source, and it was sorted chronologically by videoPublishedAt!
SOURCE_IDS             = cic(r'\[((?P<source>(anidb(|[2-4])|tvdb(|[2-5])|tmdb|tsdb|imdb|youtube(|[2-3])))-(?P<id>[^\[\]]*)|(?P<yt>(PL[^\[\]]{16}|PL[^\[\]]{32}|(UU|FL|LP|RD|UC|HC)[^\[\]]{22})))\]')
SOURCE_ID_FILES        = ["anidb.id", "anidb2.id", "anidb3.id", "anidb4.id", "tvdb.id", "tvdb2.id", "tvdb3.id", "tvdb4.id", "tvdb5.id", "tmdb.id", "tsdb.id", "imdb.id", "youtube.id", "youtube2.id", "youtube3.id"]

Last problem here is that the agent didn't like folder_show with youtube3- in it. Like before, I was able to fix by naming with the folder with the playlist ID in a separate set of brackets, like [youtube3-playlistid][playlistid]. Obviously that's not ideal, so maybe there's a way to pass just the id to the agent.

Didn't try [youtube2-xxx] for channels or .info.json.

One more note, about line 843. I realized YouTube API only returns 50 max, so no reason for 100 here (my mistake!)

YOUTUBE_PLAYLIST_ITEMS = 'https://www.googleapis.com/youtube/v3/playlistItems?part=snippet,contentDetails&maxResults=100&playlistId={}&key='+API_KEY

P.S. While testing, I found a bug with the agent in the rewrite branch. There is still a line (just a log) that expects to find metadata_source from prefs, which was removed in rewrite. I just commented out that log line (100) to fix it.

https://github.com/ZeroQI/YouTube-Agent.bundle/blob/rewrite/Contents/Code/__init__.py#L100

ZeroQI commented 3 years ago

Updated to fix [youtubex-xxxx] mode, 50 items, etc... https://gist.github.com/ZeroQI/03e4dbb4f3805305adf8947d7e03c901

If tests of scanner and agent rewrite branch are OK, will push to master

micahmo commented 3 years ago

Nice, youtube3 now works without changing SOURCE_IDS or SOURCE_ID_FILES and the agent doesn't throw an error about metadata_source now!

There's still a small problem with the agent handling youtube3. I found this in the logs.

2021-03-10 07:35:16,262 (15245bdfe700) :  INFO (__init__:150) - search() - YouTube ID not found - regex: "PLAYLIST"

I think the agent regex doesn't handle youtubex correctly. It needs [0-9]* (or whatever range you want to support).

\[(?:youtube[0-9]*\-)?(?P<id>PL[^\[\]]{16}|PL[^\[\]]{32}|UU[^\[\]]{22}|FL[^\[\]]{22}|LP[^\[\]]{22}|RD[^\[\]]{22}|UC[^\[\]]{22}|HC[^\[\]]{22})\]

Test: https://regex101.com/r/zwElNY/1

ZeroQI commented 3 years ago

All youtube2/youtube3 modes are (should have been) changed to "title [youtube-xxxxx]" when passed to the agent, as it doesn't need to know if eps are in a season or not, just the video id in the filename to match the episode metadata correctly regardless of episode number or season, using json file if present or API call with lplaylist or channel items

folder_show include youtube3 and pass it to the agent not equipped to deal with that seemingly...

i then cleaned the calls to insert the i into the folder_show variable

Updated code on gist: https://gist.github.com/ZeroQI/03e4dbb4f3805305adf8947d7e03c901

micahmo commented 3 years ago

Ahh ok, I wasn't sure if the fix should be in the scanner (always pass [youtube-xxx] to agent), or in the agent (accept [youtubex-xxx). Makes sense that the scanner should clean it up before passing to agent.

Got the latest gist, but agent still doesn't like it for some reason. I think the problem is that the agent always parses the file path, which still has youtube3 in the name. I'm not sure how the agent would access the show name as given by the scanner. Maybe the scanner can manipulate the path that the agent parses? Or maybe that's dangerous. :-)

  try:
    for regex, url in [('PLAYLIST', YOUTUBE_PLAYLIST_REGEX), ('CHANNEL', YOUTUBE_CHANNEL_REGEX), ('VIDEO', YOUTUBE_VIDEO_REGEX)]:
      result = url.search(filename)
      if result:
        guid = result.group('id')
        Log.Info(u'search() - YouTube ID found - regex: {}, youtube ID: "{}"'.format(regex, guid))
        results.Append( MetadataSearchResult( id='youtube|{}|{}'.format(guid,os.path.basename(dir)), name=displayname, year=None, score=100, lang=lang ) )
        Log(u''.ljust(157, '='))
        return
      else: Log.Info('search() - YouTube ID not found - regex: "{}"'.format(regex))  
  except Exception as e:  Log('search() - filename: "{}" Regex failed to find YouTube id, error: "{}"'.format(filename, e))
ZeroQI commented 3 years ago

The scanner adds a series title "serie title [youtube-xxxxx]" and season and episode and filepath, year, and that's about it roughly The agent search function gets the serie name and file path, and assign a unique metadata.id... The agent update function then gets the metadata.id and download episode meta according to the videoid and channel/playlist id

The *.agent.search.log for the impacted series would show the info the custom per series scanner logs would show the series "title [youtube-xxxx]" title passed

micahmo commented 3 years ago

Alright, somewhere there is a disconnect where the agent does not like the [youtube3-xxx]. I think I tracked it down.

if len(guid)>2 and guid[0:2] in ('PL', 'UU', 'FL', 'LP', 'RD'):
    . . .
#NOT PLAYLIST NOR CHANNEL GUID
elif not (guid.startswith('UC') or guid.startswith('HC')):  
    Log.Info('No GUID so random folder')
    metadata.title = series_folder  #instead of path use series foldername

So ultimately, the update function is relying on the search function to put the playlist id in the metadata, but it doesn't do that because the filepath doesn't match the playlist regex.

Hopefully that all makes sense! :-)

ZeroQI commented 3 years ago

Yes, makes perfect sense

Please let me know any remaining issue but should be more stable now

micahmo commented 3 years ago

Alrighty, I started with a clean slate. Grabbed scanner from master and agent from master. Tested all scenarios (normal, reversed, chronological). All good!!

Only one tiny problem. It looks like metadata_source came back when merging rewrite to master (here). But I removed it (before performing tests above), and otherwise everything is perfect!

ZeroQI commented 3 years ago

Closing that case at last, thanks for your assistance, code is better and nice youtube2/3 now functionnal

micahmo commented 3 years ago

Awesome, everything is perfect now. Can't thank you enough for the time you've spent on this!!

ZeroQI commented 3 years ago

Excellent! Thanks for the donation, much appreciated

Not using Youtube agent for myself so not much motivated or in-depth testing, so that helps greatly buy n that regard, now with the playlist json support, need to add it to the agent and we could be pretty much offline...

Let me know if you encounter any issues or have questions

alneven commented 1 year ago

@ZeroQI Maybe you could update the github readme table with “youtube3” for reverse playlist listings if new EPs are added to the top. I had this issue and find out here that I dont need youtube2 but the youtube3 string in the folder name. thanks!

Ps: I have now my folder structure like this

Youtube / Playlist.Name1 [PLxxIDxx] / Season.2023 Playlist.Name2 [PLxxIDxx] / Season.2022 Playlist.Name2 [PLxxIDxx] / Season.2023

And if the playlist get first scanned in Plex and shows up I rename it from the channel name to the playlist name. If I have something from the same channel but different playlist it is not under the channel, but directly under the playlist name.

so I dont have Youtube / Channel [UCxxIDxx] / Playlist.Name1 [PLxxIDxx] Channel [UCxxIDxx] / Playlist.Name2 [PLxxIDxx]

which would mean I dont need Season.YYYY folders But sometimes a lot of files in one Playlist folder within the channel folder And maybe in Plex only the channel and the playlists as “season folders”

@micahmo what structure works for you?

micahmo commented 1 year ago

@alneven I don't do any nesting with seasons or playlists within channels. I find it makes things simpler if everything is top-level.

I just checked, and all of mine fall into one of these categories.

Note especially the last one, if I want to download videos related to a channel, I will most often use the channel playlist. That gives the best chance of chronological sorting without collisions. The only small downside is if they delete a video which changes the index of subsequent videos.