Closed himiro closed 6 years ago
Can I ask how you know (with the exception of TrendsmapResolver) that the first batch are all from Twitter and the second are all from Facebook?
There is another issue #238 that has brought up Facebook using different headers when prefetching. Currently we do not inspect the HTTP_PURPOSE or HTTP_X_PURPOSE headers, but Facebook and others have been known to use these when prefetching data. I.e not an actual user visit.
Also yes we should add TrendsmapResolver
as a known bot. We currently check for Trendsmap Resolver
with a space but they have obviously removed this so we need to check for both.
Would you like to add TrendsmapResolver
in a PR?
Thank's for the answer. We retrieve the social network datas, so we know from which one the action comes.
We already check the prefetching but I will do some complementary tests to see if we missed something.
The PR is done.
We retrieve the social network datas, so we know from which one the action comes.
Could you explain this in more detail?
I have left some feedback on the PR, thank you 👍
We get the url and the headers which give us some datas like user agent.
Sorry I don't understand how you are finding out that the list of user agents are from Twitter and Facebook.
Is there a specific header or IP address that is telling you those user agents are from social networks?
I didn't managed this part but apparently it comes from the url and its routes.
Hello,
We use crawler-detect to detect social networks bots and we've noticed that some bot user agents passed the tests. There they are : Twitter :
Mozilla/5.0 (compatible; TrendsmapResolver/0.1)
Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)
Mozilla/5.0
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)
Facebook :
Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_3 like Mac OS X) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.0 Mobile/14G60 Safari/602.1
Mozilla/5.0 (iPhone; CPU iPhone OS 11_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.0 Mobile/15E148 Safari/604.1
Mozilla/5.0 (iPhone; CPU iPhone OS 11_4_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.0 Mobile/15E148 Safari/604.1
Mozilla/5.0 (iPhone; CPU iPhone OS 12_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1
Mozilla/5.0 (iPhone; CPU iPhone OS 12_0_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1
I've put them in the tests/crawlers.txt but we cannot differenciate these user agents from ordinary user agents (except for the first user agent) so I just add the TrendsmapResolver to the Fixtures/Crawlers.php.
Could you please let me know how to recognize that tey're bots ?
Yours sincerely,
Mathilde