Open jekram opened 4 years ago
When we fetch the metadata of the url using the library 'open-graph-scraper', we are not getting the metadata in response for some links. Instead, we are getting this response: { ogTitle: 'Bloomberg - Are you a robot?', ogImage: [] }
Locally, it gives the metadata in response. But on staging and production, it gives above response. I tried using another library to fetch the metadata but I still get the same response. So, it is probably the issue with the origin of the request. I googled this issue but couldn't find any solution. I have opened an issue on the github repository of open-graph-scraper: https://github.com/jshemas/openGraphScraper/issues/93 We have also posted this question on stackoverflow: https://stackoverflow.com/questions/62590220/getting-are-you-a-robot-response-when-trying-to-get-url-meta-using-open-graph
Waiting for response now
What is the difference when we call it locally or from staging or production? Is it not the same API?
Yes, the API is the same. Just the origin of the request is different i.e. one is being called locally and other is being called from staging or production. Basically, the website is blocking us from scrapping it. I have got the following response and will do the solution suggested here today:
So, I spent alot of time on this issue today along with @ImranBinShoukat. Basically the issue is that some urls (e.g. bloomberg.com) are blocking our bot from getting their metadata. This happens when websites think that robots are trying to access their website. I don't know what behaviour causes this. We are able to get the meta data of most urls but just this bloomberg website is not allowing us to fetch its metadata. We tried to use proxy as suggested by the guy in github issue (screenshot provided in above comment) but couldn't succeed. We will have to create a separate proxy server and for that, we will have to fully understand how proxy servers work.
Autoposting issue
Pagename: Kiboalert jekram@hotmail.com
Getting only part of the tweet: