Error 400 youtube link.

Dhaiwat10 / rlp-proxy

The proxy used by @dhaiwat10/react-link-preview to fetch metadata for URLs.

https://www.npmjs.com/package/@dhaiwat10/react-link-preview

34 stars 134 forks source link

Error 400 youtube link. #1

Closed jeremias-jalil closed 3 years ago

jeremias-jalil commented 3 years ago

No response or rendering with youtube links.

Dhaiwat10 commented 3 years ago

Interesting. Thanks for reporting this.

This is an issue particularly with these YT video URLs, right? The scraper is not being able to get any data from such video URLs. It was working fine until today. I'll try and have a look.

Dhaiwat10 commented 3 years ago

Here is what I found:

The Heroku proxy returns metadata: null, but if I run the proxy locally I get the proper response. (see images below)

Dhaiwat10 commented 3 years ago

Anyways, the status code should be 404 when there is no metadata, and not 400. Thanks for pointing that out, fixed!

reflash commented 3 years ago

I had the same issue, so I deployed an instance on heroku.

I get kinda weird results though. if I use /v2?url= I get the following

{"metadata":{"title":" - YouTube","description":"Enjoy the videos and music that you love, upload original content and share it all with friends, family and the world on YouTube.","image":"undefined/img-placeholder.jpg","siteName":"","hostname":"www.youtube.com"}}

if I use it as you've shown on your screenshot I get correct results, but those are in a different format, so I can't use it

Dhaiwat10 commented 3 years ago

Yeah, the / route returns the results in an old format, used in & maintained only for older versions of the package.
The /v2 route is the one we want to use. I incorrectly used the / route in the screenshot above. Apologies.
The output you shared is for youtube.com - the homepage and not an individual video. (correct me if I'm wrong) The problem is with the individual videos. Try these links: http://rlp-proxy.herokuapp.com/v2?url=https://youtube.com, http://rlp-proxy.herokuapp.com/?url=https://www.youtube.com/watch?v=WA55cpvLGkk&ab_channel=FORMULA1

Notice how the output is fine for the homepage, but not for the individual video. I hope this was helpful.

reflash commented 3 years ago

Yeah, the / route returns the results in an old format, used in & maintained only for older versions of the package.

The /v2 route is the one we want to use. I incorrectly used the / route in the screenshot above. Apologies.

The output you shared is for youtube.com - the homepage and not an individual video. (correct me if I'm wrong) The problem is with the individual videos. Try these links: http://rlp-proxy.herokuapp.com/v2?url=https://youtube.com, http://rlp-proxy.herokuapp.com/?url=https://www.youtube.com/watch?v=WA55cpvLGkk&ab_channel=FORMULA1

Notice how the output is fine for the homepage, but not for the individual video. I hope this was helpful.

Thank you! Thing is I've used an individual video link - see this: https://rlp-proxy-edroom.herokuapp.com/v2?url=https://www.youtube.com/watch?v=cuHDQhDhvPE

Dhaiwat10 commented 3 years ago

I see. I can't figure out the reason behind this ambiguity. I'll post an update here if I find something.

reflash commented 3 years ago

Is there any update on this? I could try to take a look, but not even sure where to start 😄

Dhaiwat10 commented 3 years ago

Haha so the project basically uses this npm package that retrieves the metadata: html-metadata-parser. I haven't written my own scraper

reflash commented 3 years ago

so the problem appears to be in here: https://github.com/Dhaiwat10/rlp-proxy/blob/1dc7d107599fd2960152c4796e92f1832a5fe601/src/index.ts#L72

Is there any specific reason this line was added? youtube links contain both lower and upper case characters in video hash, so it seems it defaults to base youtube link if not found

Dhaiwat10 commented 3 years ago

Is there any specific reason this line was added?

I added this because I'm checking for & pre-pending any links not containing http:// or https:// with http://. (see the code on line 73)

youtube links contain both lower and upper case characters in video hash, so it seems it defaults to base youtube link if not found

Interesting. Nice catch. I'll have a look at your PR now, thanks 👍🏻