Cloudkibo / KiboPush

0 stars 1 forks source link

Autoposting issue #8993

Open jekram opened 4 years ago

jekram commented 4 years ago

Autoposting issue

Pagename: Kiboalert jekram@hotmail.com

Getting only part of the tweet:

Screen Shot 2020-06-25 at 12 02 00 AM Screen Shot 2020-06-25 at 12 01 23 AM
AnishaChhatwani commented 4 years ago

When we fetch the metadata of the url using the library 'open-graph-scraper', we are not getting the metadata in response for some links. Instead, we are getting this response: { ogTitle: 'Bloomberg - Are you a robot?', ogImage: [] }

Locally, it gives the metadata in response. But on staging and production, it gives above response. I tried using another library to fetch the metadata but I still get the same response. So, it is probably the issue with the origin of the request. I googled this issue but couldn't find any solution. I have opened an issue on the github repository of open-graph-scraper: https://github.com/jshemas/openGraphScraper/issues/93 We have also posted this question on stackoverflow: https://stackoverflow.com/questions/62590220/getting-are-you-a-robot-response-when-trying-to-get-url-meta-using-open-graph

Waiting for response now

jekram commented 4 years ago

What is the difference when we call it locally or from staging or production? Is it not the same API?

AnishaChhatwani commented 4 years ago

Yes, the API is the same. Just the origin of the request is different i.e. one is being called locally and other is being called from staging or production. Basically, the website is blocking us from scrapping it. I have got the following response and will do the solution suggested here today: Screen Shot 2020-06-29 at 9 28 39 AM

AnishaChhatwani commented 4 years ago

So, I spent alot of time on this issue today along with @ImranBinShoukat. Basically the issue is that some urls (e.g. bloomberg.com) are blocking our bot from getting their metadata. This happens when websites think that robots are trying to access their website. I don't know what behaviour causes this. We are able to get the meta data of most urls but just this bloomberg website is not allowing us to fetch its metadata. We tried to use proxy as suggested by the guy in github issue (screenshot provided in above comment) but couldn't succeed. We will have to create a separate proxy server and for that, we will have to fully understand how proxy servers work.