Closed Termiiii closed 6 years ago
Hi, Those segmented DASH streams that need to be combined, YoutubeExplode actually purposely skips them. https://github.com/Tyrrrz/YoutubeExplode/blob/c85bf3bc4b8ef12f8152372bf2a7a5aa2c84dc5a/YoutubeExplode/YoutubeClient.Video.cs#L399
Can you explain why would you want to download those streams?
For my project, I need to download (certain) new videos as fast as possible and analyze them with Tensorflow. The quicker, the better.
currently, I try to either download format 136(video only mp4 720p) or 22(muxed mp4 720p) with the download URL (from youtube-dl) by using ffmpeg. I am using ffmpeg to download the video in multiple .mp4 parts. Each part 1 minute long. That way Tensorflow can start working after around 2 seconds. Some YouTube videos are multiple hours long, it would be a waste if Tensorflow would need to wait a few minutes before it can start.
I could speed my project up by a lot if I could download format 136 into several .mp4 parts. Since format 136 often has no working download URL, I need to wait until format 22 is available, which slows down everything by a minutes on average.
Currently I am trying to learn python in order to understand what youtube-dl does. But some insight from you would be awesome as well. I read your reverse engineering-youtube guide half a year ago, that's why I am here. I initially thought YoutubeExplode would be able to download Dash-formats. As said before, I heavily prefer C# over python.
Those segmented DASH streams that need to be combined, YoutubeExplode actually purposely skips them.
Correct me if I am wrong. Only streams that have a working BaseURL and therefore dont need to be downloaded partially should be returned by GetVideoMediaStreamInfosAsync(videoId), right?
var streamInfoSet = await client.GetVideoMediaStreamInfosAsync(videoId);
// Skip partial streams
if (streamXml.Descendants("Initialization").FirstOrDefault()?.Attribute("sourceURL")?.Value
.Contains("sq/") == true)
continue;
// Extract values
var itag = (int) streamXml.Attribute("id");
var url = (string) streamXml.Element("BaseURL");
var bitrate = (long) streamXml.Attribute("bandwidth");
I can assure you, that streams with non-working BaseURLs (like some format 136-videos) are returned by GetVideoMediaStreamInfosAsync(videoId). For example this code would throw an unhandled exception:
public async void testStuff() {
var videoId = "aPEFSbW0-po"; //2 min video
var client = new YoutubeClient();
//getting information about all available formats (and more)
var streamInfoSet = await client.GetVideoMediaStreamInfosAsync(videoId);
var infos = streamInfoSet.GetAll();
//walk through each individuel format available
foreach (MediaStreamInfo info in infos){
//print and Download the BaseURL of format 136 (if available)
if (info.Itag == 136){
Console.WriteLine("Download URL for format 136:");
Console.WriteLine(info.Url + "\n");
Console.WriteLine("Downloading...");
await client.DownloadMediaStreamAsync(info, "D:\\test.mp4");
}
}
}
Exception: System.Net.Http.HttpRequestException: "Response status code does not indicate success: 404 (Not Found)." in line:"await client.DownloadMediaStreamAsync(info, "D:\test.mp4");"
Yes, that's the idea. Although it's possible to implement downloading of such streams, I've never seen the point since there were easier alternatives.
If it throws an exception then it is indeed a bug and looks like that if
condition that checks if a steam is partial needs to be updated.
Btw, is there a reason you are specifically interested in itag 136 and 137? Is it because it appears sooner than other formats?
Btw, is there a reason you are specifically interested in itag 136 and 137? Is it because it appears sooner than other formats?
Thats exactly it.
The only reason I am interested in 137, sometimes videos are uploaded in 1080p and the format is available as soon as the video is made public. I dont think I ever had a broken BaseURL for format 137. So in the case format 136 does not work, I am using format 137 (if available). Else I need to wait until either format 137 or format 22 is uploaded.
The above mentioned videoId: "aPEFSbW0-po" only has a broken BaseURL for format 133, 135 and 136. I dont know if I ever had a broken baseURL for format 134.
Okay, I see. So, first of all, I need to fix the filtering logic so that partial streams are not shown, because currently, YTE doesn't know how to download them. Then, I suppose, a separate issue needs to be created to add support for downloading of partial streams.
wow, awesome that you consider putting my problem on your TODO-list. If I should make progress myself, I will tell you asap. Dont expect anything soon though, I am struggling with small things more than I should.
Okay I just tested and I couldn't reproduce this. On the video you mentioned I couldn't find itag 136 (after many retries), but I tried with itag 137 and it downloaded perfectly fine.
I tried with itag 137 and it downloaded perfectly fine.
as said above, I didnt have problems with format 137 either (I put it on the same list as 136 because it might cause problems since they are very similiar). The formats (itags) that I regularly have/had problems with are: 133, 135 and 136.
On the video you mentioned I couldn't find itag 136 (until many retries)
Thats odd, I seem to find format 136 every time when using YoutubeExplode. I found format 136 exactly 100 out of 100 times. Here the (slow) code:
static void Main(string[] args){
Program program = new Program();
for (int i = 0; i < 100; i++) {
program.printAllFormats();
}
Console.ReadLine();
}
//prints all formats(itags) that are available for the video (defined in the method)
public async void printAllFormats(){
var videoId = "aPEFSbW0-po"; //2 min video
var client = new YoutubeClient();
String s = "";
//getting information about all available formats (and more)
var streamInfoSet = await client.GetVideoMediaStreamInfosAsync(videoId);
var infos = streamInfoSet.GetAll();
//print which formats are available
foreach (MediaStreamInfo info in infos){
s += info.Itag + " ";
}
Console.WriteLine(s);
}
But I am aware about that kind of behaviour (finding different itags when sending multiple requests). The Dev of youtube-dl commented the following:
We also try looking in get_video_info since it may contain different dashmpd URL that points to a DASH manifest with possibly different itag set (some itags are missing from DASH manifest pointed by webpage's dashmpd, some - from DASH manifest pointed by get_video_info's dashmpd). The general idea is to take a union of itags of both DASH manifests (for example video with such 'manifest behavior' see https://github.com/rg3/youtube-dl/issues/6093).
Source(line 1573-1578): https://github.com/rg3/youtube-dl/blob/e06632e3fe25036b804a62469bb18fa4c37e3368/youtube_dl/extractor/youtube.py
In fact, if you try to download these Dash Segments, you might want to keep in mind another thing that Dev said:
YouTube may often return 404 HTTP error for a fragment causing the whole download to fail. However if the same fragment is immediately retried with the same request data this usually succeeds (1-2 attemps is usually enough) thus allowing to download the whole file successfully. To be future-proof we will retry all fragments that fail with any HTTP error.
Source(line 54-59): https://github.com/rg3/youtube-dl/blob/e06632e3fe25036b804a62469bb18fa4c37e3368/youtube_dl/downloader/dash.py
Hm. I will have to keep trying then. I know sometimes I was able to get different itag sets one day apart.
I just updated my above comment one last time.
I know sometimes I was able to get different itag sets one day apart.
I previously wrote my own Youtube Downloader. But the behavior you mentioned made me switch to youtube-dl. YouTube is trying really hard to make reverse engineering hard. It is really nerve-wracking dealing with all these issues YouTube is throwing at you.
I finally caught itag 136 and it didn't have a working URL. It was actually not inside DASH manifest, but rather inside embedded adaptive streams, so it shouldn't be partial. Also, the content length property of that stream was 0 so it was a good giveaway that the stream was faulty. I'm not sure if it's an error on YouTube's side, but it has happened before that some streams just don't work.
I started using YouTubeExplode a few days ago (I think I have the newest version). I were using youtube-dl before, but I switched because your code is more readable (mainly because I prefer C# over python).
From what I can tell, you are getting the Download links of videos the same way youtube-dl gets them. But youtube-dl won't try to download the videos from their URLs if their URLs don't work. There are video formats that (I think always) get uploaded as DASH-segments (many small parts that have to be combined to get the full video). The 2 formats I have most experience with are 136 (DASH mp4 720p video only) and 137 (DASH mp4 1080p video only).
These formats sometimes have a working download URL and sometimes have not. If you want to consistently download these formats, you need to download the DASH-segments and combine them locally (youtube-dl does that). If I try to download the videoformat 136 of this video: https://www.youtube.com/watch?v=Lhw5xo67tdE with youtube-dl, it works. With YoutubeExplode, it does not.
I hope you can implement such a feature as well.
Here is code I used to test YoutubeExplodes behavior towards DASH-format 136 compared to a the normal format 22.