fent / node-ytdl-core

YouTube video downloader in javascript.
MIT License
4.54k stars 803 forks source link

Not working! #1305

Open leote2001 opened 4 months ago

leote2001 commented 4 months ago

Error when I try to download video.

Dmytro-Tihunov commented 4 months ago

the same here https://github.com/fent/node-ytdl-core/issues/1295

corwin-of-amber commented 4 months ago

FWIW A fix was found by this developer, although it would need porting: https://github.com/distubejs/ytdl-core

Dmytro-Tihunov commented 4 months ago

@corwin-of-amber have you managed to implement it ? do you mean ip rotation ? but it works for one ip for once for me

corwin-of-amber commented 4 months ago

Interesting, @Dmytro-Tihunov; I have not investigated the fix yet, but it looks like it involves some regex updates, I did not see anything about IPs being mentioned in the commits. I was able to download multiple videos with this, although there seems to be some problem with the sound. https://github.com/distubejs/ytdl-core/commit/3df824e57fe4ce3037a91efd124b729dea38c01f

corwin-of-amber commented 4 months ago

Ok, the problem is definitely the nTransform function: https://github.com/fent/node-ytdl-core/blob/9e15c7381f1eba188aba8b536097264db6ad3f7e/lib/sig.js#L57

which can be extracted with this regexp: https://github.com/distubejs/ytdl-core/blob/7f7db1062069f13063cf0ee5d652ed33b42e28cb/lib/sig.js#L56

N_TRANSFORM_REGEXP = 'function\\(\\s*(\\w+)\\s*\\)\\s*\\{' +
  'var\\s*(\\w+)=(?:\\1\\.split\\(""\\)|String\\.prototype\\.split\\.call\\(\\1,""\\)),' +
  '\\s*(\\w+)=(\\[.*?]);\\s*\\3\\[\\d+]' +
  '(.*?try)(\\{.*?})catch\\(\\s*(\\w+)\\s*\\)\\s*\\' +
  '{\\s*return"enhanced_except_([A-z0-9-]+)"\\s*\\+\\s*\\1\\s*}' +
  '\\s*return\\s*(\\2\\.join\\(""\\)|Array\\.prototype\\.join\\.call\\(\\2,""\\))};';
corwin-of-amber commented 4 months ago

I was able to get the correct sig by replacing the function extractNCode above with:

  const extractNCode = () => {
    const N_TRANSFORM_REGEXP = 'function\\(\\s*(\\w+)\\s*\\)\\s*\\{' +
      'var\\s*(\\w+)=(?:\\1\\.split\\(""\\)|String\\.prototype\\.split\\.call\\(\\1,""\\)),' +
      '\\s*(\\w+)=(\\[.*?]);\\s*\\3\\[\\d+]' +
      '(.*?try)(\\{.*?})catch\\(\\s*(\\w+)\\s*\\)\\s*\\' +
      '{\\s*return"enhanced_except_([A-z0-9-]+)"\\s*\\+\\s*\\1\\s*}' +
      '\\s*return\\s*(\\2\\.join\\(""\\)|Array\\.prototype\\.join\\.call\\(\\2,""\\))};';

    let mo = body.match(new RegExp(N_TRANSFORM_REGEXP, 's'));
    if (mo) {
      let fnbody = mo[0];
      functions.push('var nxx=' + fnbody + 'nxx(ncode);');
    }
  };

Although this is a crude patch and is not idiomatic to this library. Should think of something cleaner.

corwin-of-amber commented 4 months ago

Better patch (although I am not sure how robust) — replace https://github.com/fent/node-ytdl-core/blob/9e15c7381f1eba188aba8b536097264db6ad3f7e/lib/sig.js#L58 with

    let functionName = utils.between(body, 'c=a.get(b))&&(c=', '(c)');
benkaiser commented 4 months ago

Sadly the above patches seem to fix some low quality formats but not others. On an example video (v=1ec4gu5uJ6U) I was able to load the 360p mp4 and the 144p mp4, but all others returned a 403.

AnneAlbert-wt commented 4 months ago

Better patch (although I am not sure how robust) — replace

https://github.com/fent/node-ytdl-core/blob/9e15c7381f1eba188aba8b536097264db6ad3f7e/lib/sig.js#L58

with

    let functionName = utils.between(body, 'c=a.get(b))&&(c=', '(c)');

This one works for me, thanks. Last time this 403 errors were thrown I switched to the distube fork of ytdl ([https://github.com/distubejs/ytdl-core]), which worked but now throws 403 errors as well. Switching back to ytdl-core with this functionName fix works.

hextor1 commented 4 months ago

again not working, youtube again updated their algorithm. @corwin-of-amber suddenly stop working to download youtube videos and audio

gatecrasher777 commented 4 months ago

The n code extraction is one issue. The 403 on GET method requests which affects videos longer than 1 minute is another issue. GET requests still work on 360p default format streams (with audio included) but not on adaptive formats. The challenge seems to be to convert GET requests to POST requests, otherwise the best quality you will get is 360p.

hextor1 commented 4 months ago

The n code extraction is one issue. The 403 on GET method requests which affects videos longer than 1 minute is another issue. GET requests still work on 360p default format streams (with audio included) but not on adaptive formats. The challenge seems to be to convert GET requests to POST requests, otherwise, the best quality you will get is 360p.

SO whats should we do now? is there any solution?

gatecrasher777 commented 4 months ago

SO whats should we do now? is there any solution?

There will always be a solution. The solution here will be to emulate what the YouTube site code does in the browser to fetch the streams. But that could take awhile to figure out and code. The POST requests also seem to include some encrypted payload, so a tricky exercise.

AnneAlbert-wt commented 4 months ago

Using @distube/ytdl-core (v 4.13.7) works for us on some devices but not all... someone suggested that google is doing some A/B testing

corwin-of-amber commented 3 months ago

Regarding the nTransform function: Using the Wayback Machine (https://web.archive.org/) I was able to look at previous versions of the player. The assignment syntax varies, but in the 6 versions I observed at least two things remain constant:

Given this, I propose using a regex to capture this convention, in the hopes that it will give us some breathing space for the near future. WDYT?

hextor1 commented 3 months ago

So what's the solution? @corwin-of-amber

corwin-of-amber commented 3 months ago

Currently, it looks like this:

    let mo = body.match(/index\.m3u8".*=(.*?)[.]set\(/);
    let functionName = mo && mo[1].split('(')[0];
hextor1 commented 3 months ago

Hello can you tell me which line i need to be replaced? And also let me know the file name where i add this? Please share the full code here and also add this? @corwin-of-amber

corwin-of-amber commented 3 months ago

Same as here @hextor1 https://github.com/fent/node-ytdl-core/issues/1305#issuecomment-2253373635

I.e. this is the line that needs to be replaced with the two lines above: https://github.com/fent/node-ytdl-core/blob/9e15c7381f1eba188aba8b536097264db6ad3f7e/lib/sig.js#L58

    let mo = body.match(/index\.m3u8".*=(.*?)[.]set\(/);
    let functionName = mo && mo[1].split('(')[0];

I want to see that it keeps working at least for a few days before suggesting this is a patch.

hextor1 commented 3 months ago

Hello this line i need to be add it here let functionName = mo && mo[1].split('(')[0];

Will i need to be replaced 58 line? @corwin-of-amber

hextor1 commented 3 months ago

Here is my previous code please tell me where I do replace this code:

let mo = body.match(/index.m3u8".=(.?)[.]set(/); let functionName = mo && mo[1].split('(')[0];

Old code const extractNCode = () => { let functionName = utils.between(body, 'b=a.j.n||null)&&(b=', '(b)'); if (functionName.includes('[')) functionName = utils.between(body, var ${functionName.split('[')[0]}=[, ]); if (functionName && functionName.length) { const functionStart = ${functionName}=function(a); const ndx = body.indexOf(functionStart); if (ndx >= 0) { const subBody = body.slice(ndx + functionStart.length); const functionBody = var ${functionStart}${utils.cutAfterJS(subBody)};${functionName}(ncode);; functions.push(functionBody); } } };

gatecrasher777 commented 3 months ago

In #1301 I did propose the following which avoids the problem of the nCode function name being obfuscated in everchanging layers of difficulty, by identifying the nCode block directly and thus determining its function name.

 const extractNCode = () => {
    const alphanum = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTVUWXYZ.$_0123456789';
    let functionName = '';
    let clue = body.indexOf('enhanced_except');
    if (clue < 0) clue = body.indexOf('String.prototype.split.call(a,"")');
    if (clue < 0) clue = body.indexOf('Array.prototype.join.call(b,"")');
    if (clue > 0) {
        let nstart = body.lastIndexOf('=function(a){', clue) - 1;
        while (nstart && alphanum.includes(body.charAt(nstart))) {
        functionName = body.charAt(nstart) + functionName;
        nstart--;
        }
    }
    if (functionName && functionName.length) {
      const functionStart = `${functionName}=function(a)`;
      const ndx = body.indexOf(functionStart);
      if (ndx >= 0) {
        const subBody = body.slice(ndx + functionStart.length);
        const functionBody = `var ${functionStart}${utils.cutAfterJS(subBody)};${functionName}(ncode);`;
        functions.push(functionBody);
      }
    }
  };

To reiterate: the 403 errors have nothing to do with the n transformation.

hextor1 commented 3 months ago

This one little bit better patch than previous @corwin-of-amber @gatecrasher777 const extractNCode = () => { let functionName = utils.between(body, 'b=a.j.n||null)&&(b=', '(b)'); if (functionName.includes('[')) functionName = utils.between(body, var ${functionName.split('[')[0]}=[, ]); if (functionName && functionName.length) { const functionStart = ${functionName}=function(a); const ndx = body.indexOf(functionStart); if (ndx >= 0) { const subBody = body.slice(ndx + functionStart.length); const functionBody = var ${functionStart}${utils.cutAfterJS(subBody)};${functionName}(ncode);; functions.push(functionBody); } } };

gatecrasher777 commented 3 months ago

This one little bit better patch than previous

Sure, it will give the right result for now, but it is still trying to find the function name from a moving target.

hextor1 commented 3 months ago

You right sometime its find the function when I will refresh the page then its work. i hope in future some found better and accurate solution? @gatecrasher777

corwin-of-amber commented 3 months ago

I agree with @gatecrasher777, my last patch is still heuristic, but I have tried it against the last 6 versions of player_ias (dates 05-28, 06-04, 06-05, 07-05, 07-15, 08-03) and it works consistently on all of them. The "clue" approach that is based on knowing a piece of the function code is also something that I considered. It is hard to say which is more robust/less brittle. Perhaps we need to try both and collect statistics?

Also, I would like to state that failing to find the n-transform function does indeed result in a 403 error; although, there may be other 403 errors that occur even with the right n-transform (esp. with high-bitrate formats).

gatecrasher777 commented 3 months ago

@corwin-of-amber Another breaking change occurred today. Your code in jsfiddle...

let utils = {
    between: (haystack, left, right) => {
    let pos;
    if (left instanceof RegExp) {
      const match = haystack.match(left);
      if (!match) { return ''; }
      pos = match.index + match[0].length;
    } else {
      pos = haystack.indexOf(left);
      if (pos === -1) { return ''; }
      pos += left.length;
    }
    haystack = haystack.slice(pos);
    pos = haystack.indexOf(right);
    if (pos === -1) { return ''; }
    haystack = haystack.slice(0, pos);
    return haystack;
  }
}
let body = `
var zDa=[Ema];
a.j.file==="index.m3u8"&&(delete a.j.file,a.path+="/file/index.m3u8");a.B="";a.url="";a.D&&(b="nn"[+a.D],vL(a),c=a.j[b]||null)&&(c=zDa[0](c),a.set(b,c),zDa.length||Ema(""))}};
`;
let mo = body.match(/index\.m3u8".*=(.*?)[.]set\(/);
let functionName = mo && mo[1].split('(')[0];
if (functionName.includes('[')) functionName = utils.between(body, `var ${functionName.split('[')[0]}=[`, `]`);
console.log(functionName);

Outputs Ema which is correct. The clue method also worked btw.

hextor1 commented 3 months ago

@corwin-of-amber Another breaking change occurred today. Your code in jsfiddle...

let utils = {
  between: (haystack, left, right) => {
    let pos;
    if (left instanceof RegExp) {
      const match = haystack.match(left);
      if (!match) { return ''; }
      pos = match.index + match[0].length;
    } else {
      pos = haystack.indexOf(left);
      if (pos === -1) { return ''; }
      pos += left.length;
    }
    haystack = haystack.slice(pos);
    pos = haystack.indexOf(right);
    if (pos === -1) { return ''; }
    haystack = haystack.slice(0, pos);
    return haystack;
  }
}
let body = `
var zDa=[Ema];
a.j.file==="index.m3u8"&&(delete a.j.file,a.path+="/file/index.m3u8");a.B="";a.url="";a.D&&(b="nn"[+a.D],vL(a),c=a.j[b]||null)&&(c=zDa[0](c),a.set(b,c),zDa.length||Ema(""))}};
`;
let mo = body.match(/index\.m3u8".*=(.*?)[.]set\(/);
let functionName = mo && mo[1].split('(')[0];
if (functionName.includes('[')) functionName = utils.between(body, `var ${functionName.split('[')[0]}=[`, `]`);
console.log(functionName);

Outputs Ema which is correct. The clue method also worked btw.

Where is placed? share file location with line no

gatecrasher777 commented 3 months ago

Where is placed? share file location with line no

There is no file. It is just some shorthand code to test @corwin-of-amber's functionName extraction method. You can paste it into the javascript box on https://jsfiddle.net/ and run it.