Closed jeremias-jalil closed 3 years ago
Interesting. Thanks for reporting this.
This is an issue particularly with these YT video URLs, right? The scraper is not being able to get any data from such video URLs. It was working fine until today. I'll try and have a look.
Here is what I found:
The Heroku proxy returns metadata: null
, but if I run the proxy locally I get the proper response. (see images below)
Anyways, the status code should be 404
when there is no metadata, and not 400
. Thanks for pointing that out, fixed!
I had the same issue, so I deployed an instance on heroku.
I get kinda weird results though. if I use /v2?url= I get the following
{"metadata":{"title":" - YouTube","description":"Enjoy the videos and music that you love, upload original content and share it all with friends, family and the world on YouTube.","image":"undefined/img-placeholder.jpg","siteName":"","hostname":"www.youtube.com"}}
if I use it as you've shown on your screenshot I get correct results, but those are in a different format, so I can't use it
Yeah, the /
route returns the results in an old format, used in & maintained only for older versions of the package.
The /v2
route is the one we want to use. I incorrectly used the /
route in the screenshot above. Apologies.
The output you shared is for youtube.com
- the homepage and not an individual video. (correct me if I'm wrong) The problem is with the individual videos. Try these links: http://rlp-proxy.herokuapp.com/v2?url=https://youtube.com, http://rlp-proxy.herokuapp.com/?url=https://www.youtube.com/watch?v=WA55cpvLGkk&ab_channel=FORMULA1
Notice how the output is fine for the homepage, but not for the individual video. I hope this was helpful.
- Yeah, the
/
route returns the results in an old format, used in & maintained only for older versions of the package.- The
/v2
route is the one we want to use. I incorrectly used the/
route in the screenshot above. Apologies.- The output you shared is for
youtube.com
- the homepage and not an individual video. (correct me if I'm wrong) The problem is with the individual videos. Try these links: http://rlp-proxy.herokuapp.com/v2?url=https://youtube.com, http://rlp-proxy.herokuapp.com/?url=https://www.youtube.com/watch?v=WA55cpvLGkk&ab_channel=FORMULA1Notice how the output is fine for the homepage, but not for the individual video. I hope this was helpful.
Thank you! Thing is I've used an individual video link - see this: https://rlp-proxy-edroom.herokuapp.com/v2?url=https://www.youtube.com/watch?v=cuHDQhDhvPE
I see. I can't figure out the reason behind this ambiguity. I'll post an update here if I find something.
Is there any update on this? I could try to take a look, but not even sure where to start 😄
Haha so the project basically uses this npm package that retrieves the metadata: html-metadata-parser
. I haven't written my own scraper
so the problem appears to be in here: https://github.com/Dhaiwat10/rlp-proxy/blob/1dc7d107599fd2960152c4796e92f1832a5fe601/src/index.ts#L72
Is there any specific reason this line was added? youtube links contain both lower and upper case characters in video hash, so it seems it defaults to base youtube link if not found
Is there any specific reason this line was added?
I added this because I'm checking for & pre-pending any links not containing http://
or https://
with http://
. (see the code on line 73)
youtube links contain both lower and upper case characters in video hash, so it seems it defaults to base youtube link if not found
Interesting. Nice catch. I'll have a look at your PR now, thanks 👍🏻
No response or rendering with youtube links.