dvehrs / podget

Podcast aggregator optimized for running as a scheduled job (i.e. cron) on Linux
GNU General Public License v3.0
115 stars 13 forks source link

Add youtube-dl integration #18

Open klundry opened 7 years ago

klundry commented 7 years ago

Youtube-dl supports checking youtube channel and playlist feeds and downloading them. Can we add the ability to add youtube feeds to podget and have them downloaded just like a podcast? Would be really nice to have automated local copies of the newest youtube videos available to watch offline.

dvehrs commented 7 years ago

Youtube-dl integration has been suggested before and I have not been able to conceive of a good way to do it. When you look at youtube-dl and the number of options that it allows the user to configure per playlist it becomes fairly intimidating to contemplate working it into the way podget handles feeds. For example, youtube-dl has 19 options just for video selection or 21 file system options.

Now there may be a way to do it that I'm not seeing so I will leave this issue open. If someone does have an idea, please submit a comment and we'll see if it can be done.

klundry commented 7 years ago

There is a section on the youtube-dl github on embedding it into other programs here.

I think the easiest way would be start with no options. Youtube-dl defaults to a sane best available when run with no options. You could add a youtube channel feed and when it sees a new episode just invoke "youtube-dl 'video-url'" and it will download whatever the best available version of the video is. Adding other commonly used options isn't very complicated either. Here is an excerpt from the youtube-dl page showing examples of selecting different formats. Looks fairly simple to me, but I'm not sure how it would fit into the way podget works.

Download best mp4 format available or any other best if no mp4 available

$ youtube-dl -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best'

Download best format available but not better that 480p

$ youtube-dl -f 'bestvideo[height<=480]+bestaudio/best[height<=480]'

Download best video only format but no bigger than 50 MB

$ youtube-dl -f 'best[filesize<50M]'

Download best format available via direct link over HTTP/HTTPS protocol

$ youtube-dl -f '(bestvideo+bestaudio/best)[protocol^=http]'

Download the best video format and the best audio format without merging them

$ youtube-dl -f 'bestvideo,bestaudio' -o '%(title)s.f%(format_id)s.%(ext)s'

dvehrs commented 7 years ago

The problem is that as soon as we add basic support then one user with want just this one feature and another will want something different. When I first wrote this script it was a couple hundred lines. It's now just short of 2800. I'm a little hesitant because this has the potential to become a huge commitment. Now I'm open to ideas, and given a working start we can fork a thread to host its develop in parallel to the others.

The first question will be how do we download a feed that we can parse from Youtube so we can track what we've already downloaded and get what's new. Or are we reinventing the wheel? Does youtube-dl already do this in other ways?

tastytea commented 7 years ago

Youtube channels are available as RSS feeds at the address: https://www.youtube.com/feeds/videos.xml?channel_id=XXXX. The Video URLs are in media:content tags and in link tags, for example:

or

< link rel="alternate" href="http://www.youtube.com/watch?v=q0unAVSjLnw"/>

These could be passed to youtube-dl and the command could be configurable, like the wget command.

dvehrs commented 7 years ago

Interesting method. Now this would require adding a third type of feed handling to podget but that may be doable.

Does anyone have an example channel_id that they are interested in that I can start playing with to see how easy it would be for podget to identify it and do the correct handling?

tastytea commented 7 years ago

UCp3iXxis9n_E_GfbE-_ksFw

klundry commented 7 years ago

N-O-D-E is a neat channel. I believe the ID is the string of characters at the end of the URL correct? So from the node channel URL it would be UCvrLvII5oxSWEMEkszrxXEA I think.

https://m.youtube.com/channel/UCvrLvII5oxSWEMEkszrxXEA

dvehrs commented 7 years ago

OK, the link that Klundry provided does not adhere to the RSS standard so it would be very difficult to adapt Podget to use.

However the link that tastytea provided does adhere to the standard and works with either the channel ID provided by Klundry or tastytea. Additionally this feed does appear to put the video files into the media:content or link tags as described by tastytea. It appears that each video file appears once as each tag. Now the link tag is used one extra time for the full channel feed so we would have to automatically not follow that link or just use the media:content tags. I need to experiment with youtube-dl to see which is preferable.

Additional items TODO:

  1. Determine how the script will identify the Youtube feeds to hand off to the correct parsing subroutines. Podget tries to determine this with a tag used within the first 9 lines of the feed. For the Youtube feeds, it appears we may be able to use yt:channelId which appears on the first line.
  2. Add documentation for how to add Youtube RSS Feeds.
  3. Do we make Youtube-DL an automatic dependency or merely a suggested item for people who want to get Youtube videos? I'm leaning to the second option but that means adding some checks to determine if Youtube-DL is installed when one of its feeds is found. We also don't want to break Podgets ability to work on distributions that do not offer Youtube-DL.
dvehrs commented 7 years ago

OK, tested the media:content URLs with wget and youtube-dl. Wget does not grab the right file but youtube-dl does. Additionally youtube-dl renames the downloaded file to the name of the video so that is very handy. The downloaded file is in webm format but my testing shows it works in both VLC and SMPlayer.

dvehrs commented 7 years ago

OK tested the youtube-dl option strings as suggested by klundy and we may be able to work in a default format string to use. Only one is giving me a problem and that's the filesize option. Not sure why yet but that one may be the most desirable so it would be good to figure out.

dvehrs commented 7 years ago

OK this is weird.

The string "best[filesize<50M]" does not work. It fails with "ERROR: requested format not available". I tried various filesizes and it fails for each.

But "bestvideo[filesize<10M]" does work for a variety of sizes.

And "bestvideo[ext=mp4,filesize<5M]+bestaudio[ext=m4a]" works.

So the one format that appears to have a problem is the one that they suggest in their man page.

dvehrs commented 7 years ago

OK, I just created a new branch 'dev-youtubedl' to start experimenting with this. No new code yet but I have some ideas.

dvehrs commented 7 years ago

I've started some ideas that I uploaded to the branch. Basic functionality is there for using the feeds URL and I've downloaded the Vegan/NODE feeds. I have not worked out any of the verbosity or other options. I'm already starting to get concerned that this will create complexities that make podget more difficult to maintain than it's worth. Making youtube-dl work in a way that is compatible with the existing framework that we have for wget takes more than a little massaging it into place.

I'm also concerned with how much this will add to the documentation on what options can be used with which feed type in the serverlist file.

And just to make things fun, it turns out I was wrong. The Youtube feeds are not a unique feed type but rather a customized ATOM feed. So basically we test if the feed is RSS or ATOM then if ATOM a second test is done to determine if it is YOUTUBE or not.

At this early stage, I'm starting to wonder if we wouldn't be better off creating a dedicated youtube-dl fork of podget rather than trying to squeeze support into a single script. Although I'm not certain if that would really be needed because you can point youtube-dl at the channel URL and it will download all the videos in the channel.

Also the feed URL only seems to grab the 15 latest episodes from the channel. I don't know if that will be a problem or not but it is an issue that I've encountered and haven't figured out a fix to yet.

klundry commented 7 years ago

Why not start with the most simple options, like the ones with special names that automatically select certain formats for you?

best: Select the best quality format represented by a single file with video and audio. worst: Select the worst quality format represented by a single file with video and audio. bestvideo: Select the best quality video-only format (e.g. DASH video). May not be available. worstvideo: Select the worst quality video-only format. May not be available. bestaudio: Select the best quality audio only-format. May not be available. worstaudio: Select the worst quality audio only-format. May not be available.

Then work out how to select other quality settings and formats later.

tastytea commented 7 years ago

I used the dev-youtubedl branch for a few days now with 17 youtube feeds and it's working good. There's just one little problem with playlists: youtube-dl recodes videos (and changing the file extension) to mkv sometimes, but the files in the playlist have the original file extension. The easiest fix would probably be to add --recode-video mkv to the options so that all files are mkv.

dvehrs commented 7 years ago

OK, I've been experimenting with the idea of forking a completely separate version of podget to handle youtube feeds. So far this has reduced the length of the script and made it easier to maintain (or add klundry's ideas). I have not decided if this new version would be best distributed in podget's script directory or if it deserves to be its own project. I'm leaning towards the latter but no decision yet.

dvehrs commented 7 years ago

podtube.zip

OK, here is the file I'm experimenting with. I like it better than the dev-youtubedl branch. What do you think?

klundry commented 7 years ago

Working pretty well for me but it doesn't seem to be respecting the "MOST_RECENT" config option. I had to ctrl-c because it was trying to download every video in the channel.

Is there any config options for pulling audio only etc..? Or is that something you are still working on?

dvehrs commented 7 years ago

OK, it appears that I left out filtering by MOST_RECENT. Well only one small portion but it's the important bit. I will try to add that.

I have not added any options for filtering the download based on quality wanted or audio only or anything beyond the basics.

However the important question now is which direction do we go with the development? Do we continue to try to integrate youtube handling into podget or do we focus on a new script (podtube)? I'm leaning to the second option but then we need to decide if it's best to simply include it within podget's scripts directory or do we fork it as it's own github project? My vote is for podtube and I'm heavily leaning towards setting it up in a new project. The reason I feel that way is podtube will have different requirements for install than podget has. By separating them we were maintain compatibility with the distros that already run podget.

klundry commented 7 years ago

Is adding youtube-dl as a dependency what may cause issues and why you want to keep it separate? I don't necessarily fell strongly one way or the other. On one hand it would be convenient for me personally to be able to get all my podcasts in one place if it were part of podget. On the other hand I can see there being a lot of use cases where you want to pull down other content that is not a podcast and might make more sense as a separate script.

dvehrs commented 7 years ago

I may be dating myself here but I remember when podcasts meant files to download and play on an iPod. These were almost exclusively MP3s. I don't think the genre ever intended to include videos or anything else that happens to be packable within a RSS or ATOM feed. When I released podget back in 2005, I didn't realize I would still be supporting it 12 years later. So when I have worries about maintaining it, that is the reason.

I've also witnessed podget use expand to many distributions and OSes than I ever planned on. I've worked to reduce the dependency footprint to make using podget convenient for as many people as possible. So when we discuss adding a dependency, it makes me hesitate. Especially when that dependency is for a use that the majority of users may not care to do. I can count on one hand the number of people who have asked about Youtube support in 12 years.

klundry commented 7 years ago

Well, I'm glad you created it and I think you've done a excellent job! Thank you for sharing it with the world. I can totally see where you're coming from. Honestly, I wish I could just get all the things I want to watch through an rss or atom feed. Youtube wouldn't be needed then and podget would work great as is. Unfortunately, youtube doesn't work that way and there are too many otherwise good podcasts/vidcasts that only release on youtube or some other user unfriendly method.

dvehrs commented 7 years ago

Thank you. It's always good to know that what we've done has helped others. Even if only in a small way. I hope that the changes we are discussing here can do the same and the challenge we face is finding a way that can survive for years.

I do agree that it would be nice if we could get everything we wanted through RSS or Atom feeds but I doubt the world is going to sit back and listen to our requests. So our task is to find ways to make it easy for the user regardless of what the world does. Now I'm fairly confident that Youtube will be around for a long time so finding ways to make it easy for users to consume their content is a good goal.

I haven't forgotten about this but life away from the keyboard has its own way of slowing my progress. I hope to get back to this soon.

klundry commented 7 years ago

I haven't forgotten about this but life away from the keyboard has its own way of slowing my progress. I hope to get back to this soon.

Ditto! Keep up the good work when you have the time. I'm looking forward to seeing the next revision. :)

bingalls commented 6 years ago

@klundry @tastytea Could you create a short list of favorite youtube channels in serverlist format, and by genre (programming tutorials, movie reviews, etc)? Perhaps we can pod-get added to https://github.com/sindresorhus/awesome and get enough cred for Homebrew listing! 😉 You can host these yourself on github, or perhaps the podget wiki will accept them.