OP-Engineering / link-preview-js

⛓ Extract web links information: title, description, images, videos, etc. [via OpenGraph], runs on mobiles and node.
MIT License
754 stars 120 forks source link

Not able to fetch title, description, and images for a tweet (youtube works though) #112

Closed 0xbe1 closed 2 years ago

0xbe1 commented 2 years ago

Describe the bug Not able to fetch title, description, and images for a tweet

To Reproduce Steps to reproduce the behavior:

  1. use the librarygetLinkPreview('https://twitter.com/elonmusk/status/1515799688296943636', {proxyUrl: xxx}).then(data => console.log(data))
  2. get result
    {
    "url": "https://twitter.com/elonmusk/status/1515799688296943636",
    "title": "",
    "siteName": "Twitter",
    "mediaType": "website",
    "contentType": "text/xml",
    "images": [],
    "videos": [],
    "favicons": [
    "https://abs.twimg.com/favicons/twitter.2.ico",
    "https://abs.twimg.com/responsive-web/client-web-legacy/icon-ios.b1fc7276.png"
    ]
    }
  3. add User-agent: Twitterbot to headers give me the same result

However, I am able to get all these info for a Youtube link https://www.youtube.com/watch?v=MejbOFk7H6c.

{
  "url": "https://www.youtube.com/watch?v=MejbOFk7H6c",
  "title": "OK Go - Needing/Getting - Official Video",
  "siteName": "YouTube",
  "description": "Website | http://www.okgo.netInstagram | http://www.instagram.com/okgoTwitter | http://www.twitter.com/okgoFacebook | http://www.facebook.com/okgoStore | htt...",
  "mediaType": "video.other",
  "contentType": "text/xml",
  "images": [
    "https://i.ytimg.com/vi/MejbOFk7H6c/maxresdefault.jpg"
  ],
  "videos": [],
  "favicons": [
    "https://www.youtube.com/s/desktop/b292b5ed/img/favicon_32x32.png",
    "https://www.youtube.com/s/desktop/b292b5ed/img/favicon_48x48.png",
    "https://www.youtube.com/s/desktop/b292b5ed/img/favicon_96x96.png",
    "https://www.youtube.com/s/desktop/b292b5ed/img/favicon_144x144.png",
    "https://www.youtube.com/s/desktop/b292b5ed/img/favicon.ico"
  ]
}

Expected behavior Should get description and images of the tweet https://twitter.com/elonmusk/status/1515799688296943636

Screenshots Here's the tweet:

image

Desktop (please complete the following information):

Additional context Thank you very much for the library, and appreciate your help in advance!!!

ospfranco commented 2 years ago

Read the README

0xbe1 commented 2 years ago

Hi @ospfranco, thanks for the reply! I have read README though I didn't find the answer. Could you please help explain? Thanks a lot!

ospfranco commented 2 years ago

You cannot do cross-domain requests from a browser

0xbe1 commented 2 years ago

You cannot do cross-domain requests from a browser

Oh, it is through my backend proxy, let me update the issue description. Here's the backend log

INFO:     None:0 - "GET /cors_proxy/https%3A//twitter.com/elonmusk/status/1515799688296943636 HTTP/1.1" 200 OK

Can you help revisit the issue? Thanks!

ospfranco commented 2 years ago

Then it is not a problem with the library, as you can see you can get the response from youtube, so Twitter is not returning the correct headers/information to extract the info.

It either purposefully does not return this info (maybe due to some user config?) or is not using OpenGraph tags, or changed the response shape so the parsing/extracting is not working.

As stated in the README you will have to debug this type of issue yourself. The code is fairly simple, it just fetches the HTML and looks for the correct OpenGraph tags. If the info is there, you can submit a PR that tries to take care of exceptional cases by parsing other tags. Otherwise, the info is just not there and there is nothing that can be done about it.

0xbe1 commented 2 years ago

Then it is not a problem with the library, as you can see you can get the response from youtube, so Twitter is not returning the correct headers/information to extract the info.

It either purposefully does not return this info (maybe due to some user config?) or is not using OpenGraph tags, or changed the response shape so the parsing/extracting is not working.

As stated in the README you will have to debug this type of issue yourself. The code is fairly simple, it just fetches the HTML and looks for the correct OpenGraph tags. If the info is there, you can submit a PR that tries to take care of exceptional cases by parsing other tags. Otherwise, the info is just not there and there is nothing that can be done about it.

@ospfranco thank you so much for the details, I will try to PR if I can. Thanks a lot, again :)

ospfranco commented 2 years ago

Since this is a social network it could also be an issue with redirections or facing a login screen redirection. In any case, the library is so simple that 98% of the time it is not a problem with the code, but just how the services respond to requests.