WhichBrowser / Parser-PHP

Browser sniffing gone too far — A useragent parser library for PHP
http://whichbrowser.net
MIT License
1.8k stars 237 forks source link

Add `Yahoo!` missing regex and upgrade to using oop php #587

Closed summercms closed 2 years ago

summercms commented 4 years ago

See github issue: https://github.com/WhichBrowser/Parser-PHP/issues/568

Yahoo! Slurp Bot

All these combinations are being used right now!

Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

and

Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) .default/1560563738-0
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) sieve-gq1/1544184308-0
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) kafe/1532207352-0

and

Mozilla/5.0 (iPhone; CPU iPhone OS 7_1 like Mac OS X) AppleWebKit (KHTML, like Gecko) Mobile (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

and

Mozilla/5.0 (compatible; Yahoo! DE Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

and

Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp) NOT Firefox/3.5

Yahoo! Slurp China Bot

Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)

Yahoo! Cache System Bot

YahooCacheSystem
YahooCacheSystem; YahooWebServiceClient
YahooCacheSystem;+YahooWebServiceClient

Yahoo! Japan Bot (Y!J-BRW)

Y!J-BRW/1.0 (https://www.yahoo-help.jp/app/answers/detail/p/595/a_id/42716)

Yahoo! Japan Bot (Y!J-ASR)

Y!J-ASR/1.0 crawler (https://www.yahoo-help.jp/app/answers/detail/p/595/a_id/42716/)

Yahoo! Japan Bot (Y!J-SRD)

DoCoMo/2.0 SH902i (compatible; Y!J-SRD/1.0; http://help.yahoo.co.jp/help/jp/search/indexing/indexing-27.html)

KDDI-CA33 UP.Browser/6.2.0.10.4 (compatible; Y!J-SRD/1.0; http://help.yahoo.co.jp/help/jp/search/indexing/indexing-27.html)

Vodafone/1.0/V705SH (compatible; Y!J-SRD/1.0; http://help.yahoo.co.jp/help/jp/search/indexing/indexing-27.html)

Yahoo! Seeker Testing Bot (2015 - 2019)

YahooSeeker-Testing/v3.9 (compatible; Mozilla 4.0; MSIE 5.5; http://search.yahoo.com/)

This bot has escaped on to the internet as it says IE5.5 and Mozilla 4.0 keep tracking this in the repo.

    [ 'name' => 'Yahoo! Seeker',                'id'    => 'yahoo',      'regexp' => '/YahooSeeker(?:\/([0-9.]*))?/u' ],
    [ 'name' => 'Yahoo! Seeker',                'id'    => 'yahoo',      'regexp' => '/YahooSeeker-Testing\/v([0-9.]*)/u' ],
    [ 'name' => 'Yahoo! Seeker',                'id'    => 'yahoo',      'regexp' => '/yahooseeker-jp-mobile/u' ],

Yahoo! Link Preview Bot

Mozilla/5.0 (compatible; Yahoo Link Preview; https://help.yahoo.com/kb/mail/yahoo-link-preview-SLN23615.html)

Mozilla/5.0 (compatible; Yahoo Link Preview; https://help.yahoo.com/kb/mail/yahoo-link-preview-SLN23615.html) X-SiteSpeedApp-1

Yahoo! Mail Proxy Bot

YahooMailProxy; https://help.yahoo.com/kb/yahoo-mail-proxy-SLN28749.html

Yahoo! Image Bot

Yahoo-MMCrawler/3.x (mms dash mmcrawler dash support at yahoo dash inc dot com)

Mozilla/5.0 (Yahoo-MMCrawler/4.0; mailto:vertical-crawl-support@yahoo-inc.com)

Yahoo-MMCrawler/3.x (mm dash crawler at trd dot overture dot com)

Yahoo! Ad Monitoring

Desktop user agent

Mozilla/5.0 (compatible; Yahoo Ad Monitoring https://help.yahoo.com/kb/yahoo-ad-monitoring-SLN24857.html) <YahooInternalTag>

Mobile user agent

Mozilla/5.0 (iPhone; CPU iPhone OS 7_1 like Mac OS X) AppleWebKit (KHTML, like Gecko) Mobile (compatible; Yahoo Ad monitoring; https://help.yahoo.com/kb/yahoo-ad-monitoring-SLN24857.html) <YahooInternalTag>

Link: https://help.yahoo.com/kb/yahoo-ad-monitoring-SLN24857.html

Reverse DNS

unknown-202-160-178-x.yahoo.com
lj910282.crawl.yahoo.net
dip227.lsn.bf1.yahoo.com

Dead Bots

Y!J-ASR/1.0 (2014 - 2015)

Y!J-ASR/1.0 crawler (http://www.yahoo-help.jp/app/answers/detail/p/595/a_id/42716/)
Y!J-ASR/0.1 crawler (http://www.yahoo-help.jp/app/answers/detail/p/595/a_id/42716/)

Y!J-BRJ/YATS (2012 - 2014)

Y!J-BRJ/YATS crawler (http://help.yahoo.co.jp/help/jp/search/indexing/indexing-15.html)
Y!J-BRJ/YATS crawler (http://listing.yahoo.co.jp/support/faq/int/other/other_001.html)

Y!J-BRO/YFSJ (2011 - 2014)

Y!J-BRO/YFSJ crawler (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.co.jp/help/jp/search/indexing/indexing-15.html; YahooFeedSeekerJp/2.0)

YahooYSMcm/3.0.0 (2014)

Mozilla/5.0 (YahooYSMcm/3.0.0; http://help.yahoo.com)

Y!J-BSC/1.0 (2009 - 2014)

Y!J-BSC/1.0 crawler (http://help.yahoo.co.jp/help/jp/blog-search/)
Y!J-BSC/1.0 (http://help.yahoo.co.jp/help/jp/blog-search/)

Yahoo! Slurp (2014)

Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp

Y!J-BRW/1.0 (2011 - 2013)

Y!J-BRW/1.0 crawler (http://help.yahoo.co.jp/help/jp/search/indexing/indexing-15.html)

Y!J-BRI/0.0.1 (2009 - 2012)

Y!J-BRI/0.0.1 crawler ( http://help.yahoo.co.jp/help/jp/search/indexing/indexing-15.html )

Yahoo-MMCrawler/4.0 (2009 - 2010)

Mozilla/5.0 (Yahoo-MMCrawler/4.0; mailto:vertical-crawl-support@yahoo-inc.com)

Yahoo! Site Explorer Feed Validator (2010)

Yahoo! Site Explorer Feed Validator http://help.yahoo.com/l/us/yahoo/search/siteexplorer/manage/

YahooSeeker (2009)

Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; http://help.yahoo.com/help/us/ysearch/crawling/crawling-01.html)

Yahoo! Blogs (2007)

Link: https://web.archive.org/web/20070208072346/http://help.yahoo.com/help/us/ysearch/crawling/crawling-02.html

Remove from repo.

The user agent says IE5.5 way too old.

Yahoo-Blogs/v3.9 (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.com/help/us/ysearch/crawling/crawling-02.html )

Yahoo! Slingstone / Yahoo! Link Expander (2013)

Link: https://web.archive.org/web/20140217025511/https://www.webmasterworld.com/search_engine_spiders/4629974.htm

Yahoo:LinkExpander:Slingstone

Test Link: https://thadafinser.github.io/UserAgentParserComparison/v5/user-agent-detail/44/fe/44fefeee-7071-4bf1-8348-02d49f661776.html

Remove from repo.

Yahoo! Video (2005 - 2015)

Remove from repo.

The bot Yahoo-MMAudVid has not been used for years now! The user agent says IE7 way too old.

Yahoo-MMAudVid/1.0 (mms dash mmaudvidcrawler dash support at yahoo dash inc dot com)

Yahoo-MMAudVid/2.0(mms dash mm aud vid crawler dash support at yahoo dash inc.com ;Mozilla 4.0 compatible; MSIE 7.0;Windows NT 5.0; .NET CLR 2.0)

Yahoo! Mindset (2005)

Link: https://www.askdavetaylor.com/whats_yahoo_mindset/

Remove from repo.

The UA was:

Yahoo! Mindset

Yahoo! Product Search (2005 - 2008)

Link: https://corsodicrm.files.wordpress.com/2008/05/seo-web-developer-cheat-sheet.pdf

Remove from repo.

The user agent says IE5.5 way too old.

YahooSeeker/1.1 (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.com/help/us/shop/merchant/)

Yahoo! Pipes (2015)

UA says Firefox/3.5.2 way too old.

Mozilla/5.0 (compatible; Yahoo Pipes 2.0; +http://developer.yahoo.com/yql/provider) Gecko/20090729 Firefox/3.5.2

As per Wikipedia: Defunct as of June 30, 2015 Link: https://en.wikipedia.org/wiki/Yahoo!_Pipes

Remove from repo.

Yahoo! Japan (1995)

Remove this:

[ 'name' => 'Yahoo! Japan',                 'id'    => 'yahoo',      'regexp' => '/Yahoo\! Japan/u' ],

The above regex finds the following UA combinations:

Mozilla/4.0 (compatible; Yahoo Japan; for robot study; kasugiya)

Yahoo! Japan

Mozilla/4.0 (compatible; MSIE 5.0; Windows 95; Yahoo! JAPAN Version Windows 95/NT CD-ROM Edition 1.0.; DigExt)

UA contain Mozilla/4.0 and IE 5 and Windows 95 way too old.

Yahoo! Video Search (2007 - 2008)

Remove this:

 [ 'name' => 'Yahoo! Video Search',          'id'    => 'yahoo',      'regexp' => '/YahooVideoSearch/u' ],

Finds this UA from 2007:

YahooVideoSearch 1.3

Yahoo! Site Checker (2001)

Remove this:

[ 'name' => 'Yahoo! Site Checker',          'id'    => 'y\!j',      'regexp' => '/Y\!J SiteChecker/u' ],

The above regex finds a user agent that was crawling the web when IE6 and Netscape 5.x was popular.

Yahoo! Feed Seeker (1999)

Remove this:

[ 'name' => 'Yahoo! Feed Seeker',           'id'    => 'yahoo',      'regexp' => '/YahooFeedSeeker\/([0-9.]*)/u' ],
[ 'name' => 'Yahoo! Feed Seeker',           'id'    => 'yahoo',      'regexp' => '/YahooFeedSeeker Testing\/([0-9.]*)/u' ],

Finds the following UA:

YahooFeedSeeker/2.0+(compatible;+Mozilla+4.0;+MSIE+5.5;+http://publisher.yahoo.com/rssguide)

Y!J-BRO/YFSJ crawler (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.co.jp/help/jp/search/indexing/indexing-15.html; YahooFeedSeekerJp/2.0)

YahooFeedSeeker/2.0 (compatible; Mozilla 4.0; MSIE 5.5; http://publisher.yahoo.com/rssguide)

YahooFeedSeeker/2.0 (compatible; Mozilla 4.0; MSIE 5.5; http://publisher.yahoo.com/rssguide; users 0; views 26)

YahooFeedSeeker/2.0 (compatible; Mozilla 4.0; MSIE 5.5; http://publisher.yahoo.com/rssguide; users 1; views 26)

YahooFeedSeeker/2.0+(compatible;+Mozilla+4.0;+MSIE+5.5;+http://publisher.yahoo.com/rssguide)

UA contain Mozilla/4.0 and IE 5.5 way too old.


Dead bots are not going to be added to this pr and removed from this repo.

Link: https://www.yahoo.com/

Link: https://www.yahoo.co.jp/

Link: https://help.yahoo.com/kb/search-for-desktop/SLN22600.html?impressions=true

coveralls commented 4 years ago

Coverage Status

Coverage decreased (-0.1%) to 99.807% when pulling b70ff448be218ed4e42a7a9e17ae866f0d0e118e on ayumi-cloud:Yahoo into 880b9fa797401d14b28956442944c3daa70240ff on WhichBrowser:master.

summercms commented 4 years ago

Inktomi Slurp

This company pretty much died from the dot-com bubble days, see Wikipedia: https://en.wikipedia.org/wiki/Inktomi

However, the bots seem to have escaped on to the internet are still being found to this day!

Current Regex:

    [ 'name' => 'Inktomi Slurp',                'id'    => 'slurp',      'regexp' => '/Slurp\/([0-9.]*)/u' ],
    [ 'name' => 'Inktomi Slurp',                'id'    => 'slurp',      'regexp' => '/Slurp\.so\/([0-9.]*)/u' ],

Regex broken and not working right now.

Current UA's being found in our test servers:

Mozilla/5.0 (Slurp/cat; slurp@inktomi.com; http://www.inktomi.com/slurp.html)

Mozilla/5.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)

slurp, yahoo! slurp, slurp/2.0, inktomi slurp, slurp.so/1.0

Slurp/2.0 (slurp@inktomi.com; http://www.inktomi.com/slurp.html)

This pr will create a working regex.

summercms commented 4 years ago

List of keywords containing fake or dead bots that can be filtered and labelled:

MSIE 4
MSIE 5
MSIE 6
MSIE 7
MSIE 8
MSIE 9
MSIE 10
YahooYSMcm
siteexplorer
Slingstone
MMAudVid
Mindset
Yahoo Pipes
YahooVideoSearch
SiteChecker
YahooFeedSeeker

Note: Both upper and lowercase matches need to apply.


Also add Yahoo! Ad Monitoring to tests:

Desktop user agent

Mozilla/5.0 (compatible; Yahoo Ad Monitoring https://help.yahoo.com/kb/yahoo-ad-monitoring-SLN24857.html) <YahooInternalTag>

Mobile user agent

Mozilla/5.0 (iPhone; CPU iPhone OS 7_1 like Mac OS X) AppleWebKit (KHTML, like Gecko) Mobile (compatible; Yahoo Ad monitoring; https://help.yahoo.com/kb/yahoo-ad-monitoring-SLN24857.html) <YahooInternalTag>

<YahooInternalTag> is used for internal request tracking.

summercms commented 4 years ago

Fake Yahoo Bots:

yahoo/Nutch-1.2 (yahoo; yahoo.com)

YahooBot/1.0

Mozilla/5.0 (compatible; Yahoo! Slurp;http://help.yahoo.com/help/us/ysearch/slurp)

Update rules to spot these fakes.

summercms commented 3 years ago

clean version with several adjustments in the Bot.php file: https://github.com/summercms/sc-parser-module/pull/172/files