Closed tractorcow closed 4 years ago
I'm not totally averse to this, but the reason we haven't done it in the past is to encourage people to PR their bot user-agents and not keep them to themselves.
What are your thoughts on this?
I think that there are certain situations where flexibility is necessary; A global site might care more about catching more bots than excluding certain traffic, whereas a china-located site might care more about excluding non-regional traffic. I think a single "source of truth" ignores the regional context. :)
Perhaps a good medium is an "aggressive mode" where greylisted bots are recorded, and can be flagged either way?
In this context, MicroMessenger
is a valid non-bot messenger, but because it's misused so extensively by bots masquerading as that user agent, it could deserve greylist status.
I'm away from my main computer this week takin some time off, so will review this PR in full when I'm back. Thanks for your thoughts 👍
You're welcome; No rush. :)
@MaxGiting Any thoughts?
If this is intended to be used to quickly get user agents added to the list that should be in this library then I am opposed for the same reason you already mention @JayBizzle that they should be PR'd.
If it is for the intention of adding user agents we will never add to the library then maybe, but from the examples given above I am not sold.
To me a bot is a bot no matter of region and should be in this library. Similarly if there was a grey list of bots, they should all just be included. Bad bots often change their user agent to look like legitimate traffic by using generic browser user agents, so I'm not sure this is a big help for those situations.
@tractorcow have I misunderstood your regional and masked traffic examples?
And should WeChat simply be in this library anyway? Is it an actual browser as well or just the user agent for preview links?
And should WeChat simply be in this library anyway? Is it an actual browser as well or just the user agent for preview links?
Given the stance, we need to remove that agent from the blacklist, since it's a valid browser.
Closing to re-open with the appropriate change as requested.
Replacement PR at https://github.com/JayBizzle/Crawler-Detect/pull/395
Thanks for your response @MaxGiting :D
This allows, for instance, a custom app to whitelist / blacklist custom agents.
For example, this is usercode from an app which needs to manually whitelist wechat.
Setters return
$this
to support chaining. E.g.$crawler->setUaHttpHeaders($headers)->setHeaders($appHeaders)