fent / node-ytdl-core

YouTube video downloader in javascript.
MIT License
4.49k stars 793 forks source link

Localized forbidden content when using source URLs in a client #936

Closed thedaviddelta closed 3 years ago

thedaviddelta commented 3 years ago

Hello.

I'm developing a full-stack application that uses node-ytdl-core for scraping the YouTube / YouTube Music audio source from an ID at the back-end, and sends that source URL to the client for playing it in a HTML audio element at the front-end.

Everything works fine locally, but the problem comes when it's hosted in a certain server in production. The server is in the US and the users that play the sources in the client may be anywhere. The YouTube videos still work completely fine, but the YouTube Music songs reply with 403 Forbidden errors.

This is probably because the YouTube Music content is under DRM, and users in a different location where it has been scraped (the server's location) may not have permissions to play it, i.e. I'm in Spain and can't play that content in my client, but it works if I use a US VPN.

I've been thinking how to solve this, like trying to change the URL ip param, but everything breaks because of the signature, and obviously I can neither intercept the TCP request and change the request IP nor use client IPs as proxy.

Do you have any idea how this may be possible to solve using this library, aside of rewriting the app and trying to use buffers instead of source URLs? Thanks in advance.

TimeForANinja commented 3 years ago

If i understood you right you call ytdl-core on the backend and then pass the format url to the client? If so then that shouldn't even work for regular youtube videos 🤔

thedaviddelta commented 3 years ago

Hi and thanks for your reply.

But I don't understand why it shouldn't work. It just uses a valid media source URL. The problem comes (I think, based on the tests I did) when there's a regional difference, because it seems that YouTube restricts its URLs depending in the location. You can try it works by yourself by cloning this project and installing its dependencies.

So I'd like to ask to any project collaborator (i.e. yourself) if you have any workaround idea of how this could be solved using the package (i.e. by specifying IP or country to scrape the URL from, or any other idea) or by any other mean, because the only thing that comes to my mind is to refactor the front-end using streams and buffers, which is very buggy compared to using the HTML audio tag.

Regards, David.

TimeForANinja commented 3 years ago

It just uses a valid media source URL.

'cause they should be ip locked... and from what i understood the client requesting the url is not the client playing it

An alternative Solution to using your server as a Proxy is to webpack ytdl-core and run it completely client sided

thedaviddelta commented 3 years ago

That would be pretty amazing, but wouldn't it be blocked by CORS then? I ask because I don't know exactly if any of the endpoints that ytdl-core uses for scraping blocks CORS. Thank you.

TimeForANinja commented 3 years ago

there are cors problems but i also know ppl found ways around it not my metier thou

thedaviddelta commented 3 years ago

Hi again. I've tried to use node-ytdl-core client-side using NextJS default Webpack config and the library worked well but the URL didn't get fetched because a CORS error. Do you know if I should modify any Webpack config for this to work? Thanks again. imagen

thedaviddelta commented 3 years ago

there are cors problems but i also know ppl found ways around it not my metier thou

Oh sorry, I just replied at the same time. Don't worry, but I don't know how to avoid CORS either apart from fetching the information server-side. I wish someone sees this and can help at this point. Thanks anyway.

thedaviddelta commented 3 years ago

Hello again. Sorry for being annoying. I've been testing to pipe the node-ytdl-core response into the endpoint response, and replace the client-side audio URL to my own endpoint URL. It seems to work well, but now I can't jump to an specified moment of the audio (by changing the currentTime of the HTML audio tag). It seems more like a stream problem than a client-side problem, because if I open the URL in my browser for it to play thanks to the MIME, I can't jump neither. I don't know why does this happen, but I think it may have relation to the lack of Content-Length header on the response.

thedaviddelta commented 3 years ago

Ok I've been working on this the whole morning and this is what I can tell:

It seems what I related last night is a Chromium-only problem, as Firefox (and Gecko) seems to work well. This is caused because in Firefox, the HTML audio tag doesn't request data using the Range header, and is able to load the full file and seek inside by itself.

But in the case of Chromium-based browsers, the audio tag requests the media using Range (by default bytes=0-) and is not able to seek. But indeed, if you add the headers Content-Length (thanks to the progress event) and Accept-Ranges it's able. It needs the first for knowing the size of the file to seek within, and the second to recognize the ability to seek.

The problem now is that, when added, this causes Chromium to request to the endpoint several times (min. of 2 at the start of the playback) with different ranges (even tho it has no real effect on the response) even if they're cached, and I haven't found a way of avoiding it.

Here is, by the moment, the fragment of my endpoint:

res.setHeader("Accept-Ranges", "bytes");
ytdl(id, {
    quality: "highestaudio"
}).on(
    "info",
    (_, format: videoFormat) => (
        format.mimeType && res.setHeader("Content-Type", format.mimeType)
    )
).on(
    "progress",
    (chunkLength, downloaded, total) => (
        res.hasHeader("Content-Length") || res.setHeader("Content-Length", total)
    )
).pipe(res);
TimeForANinja commented 3 years ago

a) The format should already include the content-length no need to wait for the progress event example: https://github.com/fent/node-ytdl-core/blob/master/example/info.json#L102

b) you said Chromium does multiple (min 2) requests to the resource - do the ranges differ? if they do you should be able to simply parse the range from the request header ytdl-core accepts that as a param: https://github.com/fent/node-ytdl-core#ytdlurl-options if your browser does multiple requests to e.g. the first 100 bytes you'd have to implement some caching manually or just live with the added requests i don't think the cdn part of youtube is ratelimited....

thedaviddelta commented 3 years ago

Thanks for your reply.

The format should already include the content-length

I know but I've read in another issue that it's sometimes not included, so I set it on both events (by checking if it's already set).

do the ranges differ?

Yes they do, that's why I can't cache it as it's forcing to repeat. I've seen I can use that ranges in the node-ytdl-core library but it doesn't fit this case because it would change the file size and duration so I just keep with the same response. By the moment I just assume that's how it works and that I have to waste 2 or more responses.

Changing the subject, the main problem I have at the moment is that sometimes the endpoint seems to get stucked. Usually, the library works as expected as it starts piping the response and the client starts receiving fragments it in less than 2 seconds. But sometimes, it just get stucked literally forever (until the timeout) without responding any value.

As my endpoint is nearly only the stream, I assume this is because of it, maybe because an error on the media fetching. You can take a look at the code here if you want.

Once again, very thanks for the replies and sorry for the spam. Regards, David

TimeForANinja commented 3 years ago

if i'd have to bet: you set res.setHeader("Accept-Ranges", "bytes"); which means you might receive requests including a Range-Header but at no point you parse those headers. Browsers sometimes split requests: e.g. they start the video and load form 0 to 1000. Then they terminate the socket since they're like "i already cached data for the next 10 minutes, let's use that socket for something else" and after 10 Minutes, when the cache runs out, they rerequest the same resource with range a range header 1000- atm you'd completely ignore that second request - so the video stops

thedaviddelta commented 3 years ago

I think I have understood you, and I'm sure that's not the problem, because in every request the endpoint returns the whole audio stream, ignoring the contents of the Range header, and in the client manages it based on the playback second.

When this error happens the endpoint logs a 500 error, so it probably happens because of some kind of exception, but even surrounding the whole code in a try/catch or using stream.on("error") it doesn't seem to report any kind of message.

Anyway, it seems that my hosting provider gives me another problem because it limits the serverless function execution to 10 seconds so it's unable to play long tracks, so I have to find another solution. And I also don't want to disturb you at the moment that you all are busy with the 404 error.

Regards, David.

TimeForANinja commented 3 years ago

I think I have understood you, and I'm sure that's not the problem, because in every request the endpoint returns the whole audio stream, ignoring the contents of the Range header, and in the client manages it based on the playback second.

i'd still give supporting range a try if i were you 😉

thedaviddelta commented 3 years ago

Hello and sorry for late response,

I've been testing that too and it produces the same error. Furthermore, it produces another issue, because the client manages the playback position based on the full stream length, which is less than expected when doing so.

This has come to a difficult-to-handle spot, as today I've been also trying to use proxies, not succeeding in the process. I've redacted a long informative issue in my own repo (TheDavidDelta/kainet-music#6) if you want to take it a look.

Thanks for all the time given to this issue. Regards, David.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.