flagbug / YoutubeExtractor

A .NET library, that allows to download videos from YouTube and/or extract their audio track (currently only for flash videos).
811 stars 375 forks source link

Parse exception #332

Closed gen3vra closed 5 years ago

gen3vra commented 5 years ago

It's broken again. What do we change this time?

anilgit90 commented 5 years ago

Please look at pull request #333 , I hope it will solve your problems.

SoftwareMagicIT commented 5 years ago

Not working. Issue exposed is "Could not decipher signature" but specific exception is 404 not found reported by ex variable. This after new EU law of copyright started on 13 dec. screenshot

anilgit90 commented 5 years ago

Ohh So the issue is country specific. My issue got resolved after I used the above fix in the pull request. Maybe I will try using VPN to test your specific case.

anilgit90 commented 5 years ago

Hi Could you please post the link of the video you are trying to download, because today i tried to download using VPN, By connecting to France Server, and i did not get any 404 error.

gen3vra commented 5 years ago

Couldn't use your solution @anilgit90 because your fork doesn't build the "Full" version of it, just the "Portable" one for some reason. And the portable one doesn't include the VideoDownloader class. Additionally, applying your changes to this repo results in a lot of YoutubeParseExceptions anyway. Does this work for you?

anilgit90 commented 5 years ago

All the changes that have done has been tested then I have raised the pull request. I tried to recreate your issue by using a VPN but, the issue that you are facing is not being recreated. Could we maybe co-ordinate over teamviewer to resolve the issue you are facing?

SoftwareMagicIT commented 5 years ago

Please you can contact me over email, I can use anydesk and load source code for seeing issue in my computer.

anilgit90 commented 5 years ago

Yeah Sure. I will get back to you, Just Gimme some time. Meanwhile i was able to recreate the issue, seems like it occurs in some youtube videos but not all @GMR516 mentioned the parseException, I was able to re-create that issue with one youtube link, seem to me like a regex matching isssue. Let me see if I can fix or pinpoint what seems to be the issue. I will get back to you @GMR516 and @SoftwareMagicIT once I have something concrete.

gen3vra commented 5 years ago

@anilgit90 Thank you very much. Greatly appreciated.

SoftwareMagicIT commented 5 years ago

I investigate: don't found this: http://s.ytimg.com/yts/jsbin/player-ias-vflSzU_20/it_IT/base.js

ok, solved: I changed player-{0} in decipher with player_{0} and now work

gen3vra commented 5 years ago

Your link doesn't seem to work @SoftwareMagicIT

SoftwareMagicIT commented 5 years ago

I chaned to "Solved" my previous comment. The link http://s.ytimg.com/yts/jsbin/player-ias-vflSzU_20/it_IT/base.js not working but the link is generated from http://s.ytimg.com/yts/jsbin/player-{0}.js getting part of page passed by argument. I changed with http://s.ytimg.com/yts/jsbin/player_{0}.js and now this work

anilgit90 commented 5 years ago

@SoftwareMagicIT that was the fix I was pushing for. Sorry for the delay. I was busy due to the new year.

SoftwareMagicIT commented 5 years ago

No problem and happy 2019 xD

anilgit90 commented 5 years ago

I hope the latest pull request solved the problem @GMR516 @SoftwareMagicIT. Could you guys check it and let me know. @SoftwareMagicIT the pull request contains the fix that you discovered.

SoftwareMagicIT commented 5 years ago

I tested, work fine.

ac-lap commented 5 years ago

The decryption seems to be failing for this url - "https://www.youtube.com/watch?v=YQHsXMglC9A". The decryption method fails to decrypt the signature, so on trying to access the 'decrypted' url, it gives 403. Can you guys give it a try?

SoftwareMagicIT commented 5 years ago

I try and in my version work fine

ac-lap commented 5 years ago

@SoftwareMagicIT, were you able to access the video from the decrypted URL? For me the encrypted and decrypted signature are same.

I am assuming in your version, you didn't do any change in the public static string DecipherWithVersion(string cipher, string cipherVersion) method in Decipherer.cs My jsUrl looks something like this - "http://s.ytimg.com/yts/jsbin/player_ias-vflWb9AD2/en_US/base.js"

SoftwareMagicIT commented 5 years ago

1) replace http with https in all http url of project (this speedup download because skip redirect) 2) this is my decipherwithversion:

`public static string DecipherWithVersion(string cipher, string cipherVersion) { string jsUrl = string.Format("https://s.ytimg.com/yts/jsbin/player_{0}.js", cipherVersion);

        string js = HttpHelper.DownloadString(jsUrl);

        //Find "C" in this: var A = B.sig||C (B.s)
        string functNamePattern = @"\""signature"",\s?([a-zA-Z0-9\$]+)\(";

        var funcName = Regex.Match(js, functNamePattern).Groups[1].Value;

        if (funcName.Contains("$"))
        {
            funcName = "\\" + funcName; //Due To Dollar Sign Introduction, Need To Escape
        }

        string funcPattern = @"(?!h\.)" + @funcName + @"(\w+)\s*=\s*function\(\s*(\w+)\s*\)\s*{\s*\2\s*=\s*\2\.split\(\""\""\)\s*;(.+)return\s*\2\.join\(\""\""\)\s*}\s*;"; //Escape funcName string
        var funcBody = Regex.Match(js, funcPattern, RegexOptions.Singleline).Value; //Entire sig function
        var lines = funcBody.Split(';'); //Each line in sig function

        string idReverse = "", idSlice = "", idCharSwap = ""; //Hold name for each cipher method
        string functionIdentifier = "";
        string operations = "";

        foreach (var line in lines.Skip(1).Take(lines.Length - 2)) //Matches the funcBody with each cipher method. Only runs till all three are defined.
        {
            if (!string.IsNullOrEmpty(idReverse) && !string.IsNullOrEmpty(idSlice) &&
                !string.IsNullOrEmpty(idCharSwap))
            {
                break; //Break loop if all three cipher methods are defined
            }

            functionIdentifier = GetFunctionFromLine(line);
            string reReverse = string.Format(@"{0}:\bfunction\b\(\w+\)", functionIdentifier); //Regex for reverse (one parameter)
            string reSlice = string.Format(@"{0}:\bfunction\b\([a],b\).(\breturn\b)?.?\w+\.", functionIdentifier); //Regex for slice (return or not)
            string reSwap = string.Format(@"{0}:\bfunction\b\(\w+\,\w\).\bvar\b.\bc=a\b", functionIdentifier); //Regex for the char swap.

            if (Regex.Match(js, reReverse).Success)
            {
                idReverse = functionIdentifier; //If def matched the regex for reverse then the current function is a defined as the reverse
            }

            if (Regex.Match(js, reSlice).Success)
            {
                idSlice = functionIdentifier; //If def matched the regex for slice then the current function is defined as the slice.
            }

            if (Regex.Match(js, reSwap).Success)
            {
                idCharSwap = functionIdentifier; //If def matched the regex for charSwap then the current function is defined as swap.
            }
        }

        foreach (var line in lines.Skip(1).Take(lines.Length - 2))
        {
            Match m;
            functionIdentifier = GetFunctionFromLine(line);

            if ((m = Regex.Match(line, @"\(\w+,(?<index>\d+)\)")).Success && functionIdentifier == idCharSwap)
            {
                operations += "w" + m.Groups["index"].Value + " "; //operation is a swap (w)
            }

            if ((m = Regex.Match(line, @"\(\w+,(?<index>\d+)\)")).Success && functionIdentifier == idSlice)
            {
                operations += "s" + m.Groups["index"].Value + " "; //operation is a slice
            }

            if (functionIdentifier == idReverse) //No regex required for reverse (reverse method has no parameters)
            {
                operations += "r "; //operation is a reverse
            }
        }

        operations = operations.Trim();

        return DecipherWithOperations(cipher, operations);
    }`
anilgit90 commented 5 years ago

@ac-lap I tried with your URL it certainly gives 403 error.

I tried the changes mentioned by @SoftwareMagicIT but still gives 403 error.

SoftwareMagicIT commented 5 years ago

if you want you can download my compiled version: https://www.softwaremagic.it/freesoftware/setupvideoconverter17.zip and after setup you can copy dll or you can try to use my software and give me feedback if work

digydigy commented 5 years ago

@SoftwareMagicIT @ac-lap

Here is another fix that doesn't require any "decryption"


DownloadUrlResolver.cs

Replace

line90: string videoTitle = GetVideoTitle(json);

line92: IEnumerable<ExtractionInfo> downloadUrls = ExtractDownloadUrls(json);

line94: IEnumerable<VideoInfo> infos = GetVideoInfos(downloadUrls, videoTitle).ToList();

with

IEnumerable<VideoInfo> infos = MyGetVideoInfos(json).ToList();

static IEnumerable<VideoInfo> MyGetVideoInfos(JObject json)
{
    var newjson = (string)json["args"]["player_response"];
    var json2 = JObject.Parse(newjson);

    var list = new List<VideoInfo>();

    foreach (var item in json2["streamingData"]["formats"])
    {
        var videoInfo = VideoInfo.Defaults.FirstOrDefault(x => x.FormatCode == (int)item["itag"]);
        if (videoInfo == null) continue;

        var v = new VideoInfo(videoInfo);

        v.DownloadUrl = (string)item["url"];
        v.Title = (string)json["args"]["title"];
        v.RequiresDecryption = false;

        list.Add(v);
    }
    return list;
}
SoftwareMagicIT commented 5 years ago

Tried, good, work fine!

SoftwareMagicIT commented 5 years ago

Ok I found a video will not working https://www.youtube.com/watch?v=4MULjbUhIz4

ColorTwist commented 5 years ago

Maybe because the video starts with "This video may be inappropriate for some users" which you need to confirm. image

SoftwareMagicIT commented 5 years ago

In my country not, don't show alert, simply failed to found this: var dataRegex = new Regex(@"ytplayer.config\s=\s({.+?}); in row: string extractedJson = dataRegex.Match(pageSource).Result("$1"); go in error (not match)

digydigy commented 5 years ago

In my country not, don't show alert, simply failed to found this: var dataRegex = new Regex(@"ytplayer.config\s=\s({.+?}); in row: string extractedJson = dataRegex.Match(pageSource).Result("$1"); go in error (not match)

@SoftwareMagicIT ,
"simply failed to found", => of course, it needs user confirmation because content is "inappropriate for some users". You can skip it for now, till someone finds a solution

BTW: As far as I can remember, this project was not able find those links even when it was working

SoftwareMagicIT commented 5 years ago

Ah, ok. If I see video with my google account, detect if I'm able to show automatically. If I don't logged, alert shown! I understand

digydigy commented 5 years ago

A simple (not tested much) alternative for all those codes of this project.

Usage:

var result = await digydigy.YoutubeLinks.Get(url);


using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net.Http;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;

namespace digydigy
{
    public class YoutubeLinks
    {

        public static async Task<Model.Response> Get(string url)
        {
            using (var client = new HttpClient())
            {
                var retObj = new JObject();

                var html = await client.GetStringAsync(url);

                var live = new JProperty("live", null);
                var video = new JProperty("video", null);

                var dataRegex = new Regex(@"ytplayer\.config\s*=\s*(\{.+?\});", RegexOptions.Multiline);
                string extractedJson = dataRegex.Match(html).Result("$1");

                var jObj = JObject.Parse(extractedJson);

                retObj.Add(new JProperty("title", jObj["args"]["title"]));
                retObj.Add(live);
                retObj.Add(video);

                var jResponse = JObject.Parse((string)jObj["args"]["player_response"]);

                var hlsManifestUrl = jResponse["streamingData"]?["hlsManifestUrl"];
                var dashManifestUr = jResponse["streamingData"]?["dashManifestUrl"];

                if (hlsManifestUrl != null || dashManifestUr != null)
                {
                    live.Value = new JObject(
                            new JProperty("hls", hlsManifestUrl),
                            new JProperty("dash", dashManifestUr)
                        );
                }

                var formats = jResponse["streamingData"]["formats"];
                var adaptiveFormats = jResponse["streamingData"]["adaptiveFormats"];

                if (formats != null || adaptiveFormats != null)
                {
                    video.Value = new JObject(
                            new JProperty("formats", jResponse["streamingData"]["formats"]),
                            new JProperty("adaptiveFormats", jResponse["streamingData"]["adaptiveFormats"])
                        );
                }

                return retObj.ToObject<Model.Response>();

            }
        }
    }

    public class Model
    {

        public class Live
        {
            public string hls { get; set; }
            public string dash { get; set; }
        }

        public class InitRange
        {
            public string start { get; set; }
            public string end { get; set; }
        }

        public class IndexRange
        {
            public string start { get; set; }
            public string end { get; set; }
        }

        public class Format
        {
            public int itag { get; set; }
            public string url { get; set; }
            public string mimeType { get; set; }
            public int bitrate { get; set; }
            public int width { get; set; }
            public int height { get; set; }
            public string lastModified { get; set; }
            public string contentLength { get; set; }
            public string quality { get; set; }
            public string qualityLabel { get; set; }
            public string projectionType { get; set; }
            public int averageBitrate { get; set; }
            public string audioQuality { get; set; }
            public string approxDurationMs { get; set; }
            public string audioSampleRate { get; set; }
        }

        public class AdaptiveFormat
        {
            public int itag { get; set; }
            public string url { get; set; }
            public string mimeType { get; set; }
            public int bitrate { get; set; }
            public int width { get; set; }
            public int height { get; set; }
            public InitRange initRange { get; set; }
            public IndexRange indexRange { get; set; }
            public string lastModified { get; set; }
            public string contentLength { get; set; }
            public string quality { get; set; }
            public int fps { get; set; }
            public string qualityLabel { get; set; }
            public string projectionType { get; set; }
            public int averageBitrate { get; set; }
            public string approxDurationMs { get; set; }
            public bool? highReplication { get; set; }
            public string audioQuality { get; set; }
            public string audioSampleRate { get; set; }
        }

        public class Video
        {
            public List<Format> formats { get; set; }
            public List<AdaptiveFormat> adaptiveFormats { get; set; }
        }

        public class Response
        {
            public string title { get; set; }
            public Live live { get; set; }
            public Video video { get; set; }

            public override string ToString()
            {
                return JsonConvert.SerializeObject(this, Formatting.Indented);
            }
        }
    }

}
ColorTwist commented 5 years ago

Looks very comfortable code compare to what we already have, very cool @digydigy . Hope to try it out this week.

ColorTwist commented 5 years ago

@digydigy How would you handle errors with your method?

i added:

   public class Video
        {
            public List<Format> formats { get; set; }
            public List<AdaptiveFormat> adaptiveFormats { get; set; }
            public string error { get; set; } //added line
        }

And wrapped everything in the main function with:

 catch (Exception ex)
                {

                    var video = new JProperty("video", null);
                    retObj.Add(video);
                    video.Value = new JObject { new JProperty("error", ex.Message) };
                    return retObj.ToObject<Model.Response>();
                }

Than in the return value you check if erroris null or not Maybe there are more elegant methods :)

digydigy commented 5 years ago

@ColorTwist the only method that may fail is dataRegex.Match(html). Check Success property

var dataRegex = new Regex(@"ytplayer\.config\s*=\s*(\{.+?\});", RegexOptions.Multiline);
var match = dataRegex.Match(html);
if (!match.Success) throw new Exception("Can not extract youtube url.");
string extractedJson = match.Result("$1");
ColorTwist commented 5 years ago

I updated my error checking method (See edit) your method works nicely, from a few tests.. Appreciated @digydigy

digydigy commented 5 years ago

@ColorTwist BTW: I would handle the errors/exceptions where I invoke this method, not in it...

ColorTwist commented 5 years ago

You mean warp var result = await digydigy.YoutubeLinks.Get(url); with Catch Yes, It might look more elegant and correct :)

ColorTwist commented 5 years ago

Hey, @digydigy Reopening for a question related to the code sample you mentioned previously in this thread. Using your code sample would it be possible to start from a specific time of the video instead of the beginning of the video?

digydigy commented 5 years ago

@ColorTwist You can use ffmpeg or a video library

For ex; start from second 30 and download the next 50 seconds of video....

ffmpeg -ss 00:00:30 -t 50 -i "extracted_youtube_url" -c:v copy -c:a copy output.mp4

ColorTwist commented 5 years ago

I see, thanks, never tried ffmpeg, that's a good direction. by the way, I did try again that elegant sample you provided on 14 Jan (in this thread) for some reason it stopped working with the giving error "No Formats or adaptiveFormats"