Closed summercms closed 2 years ago
This company pretty much died from the dot-com
bubble days, see Wikipedia: https://en.wikipedia.org/wiki/Inktomi
However, the bots seem to have escaped on to the internet are still being found to this day!
Current Regex:
[ 'name' => 'Inktomi Slurp', 'id' => 'slurp', 'regexp' => '/Slurp\/([0-9.]*)/u' ],
[ 'name' => 'Inktomi Slurp', 'id' => 'slurp', 'regexp' => '/Slurp\.so\/([0-9.]*)/u' ],
Regex broken and not working right now.
Current UA's being found in our test servers:
Mozilla/5.0 (Slurp/cat; slurp@inktomi.com; http://www.inktomi.com/slurp.html)
Mozilla/5.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)
slurp, yahoo! slurp, slurp/2.0, inktomi slurp, slurp.so/1.0
Slurp/2.0 (slurp@inktomi.com; http://www.inktomi.com/slurp.html)
This pr will create a working regex.
List of keywords containing fake
or dead
bots that can be filtered and labelled:
MSIE 4
MSIE 5
MSIE 6
MSIE 7
MSIE 8
MSIE 9
MSIE 10
YahooYSMcm
siteexplorer
Slingstone
MMAudVid
Mindset
Yahoo Pipes
YahooVideoSearch
SiteChecker
YahooFeedSeeker
Note: Both upper and lowercase matches need to apply.
Also add Yahoo! Ad Monitoring
to tests:
Desktop user agent
Mozilla/5.0 (compatible; Yahoo Ad Monitoring https://help.yahoo.com/kb/yahoo-ad-monitoring-SLN24857.html) <YahooInternalTag>
Mobile user agent
Mozilla/5.0 (iPhone; CPU iPhone OS 7_1 like Mac OS X) AppleWebKit (KHTML, like Gecko) Mobile (compatible; Yahoo Ad monitoring; https://help.yahoo.com/kb/yahoo-ad-monitoring-SLN24857.html) <YahooInternalTag>
<YahooInternalTag>
is used for internal request tracking.
Fake Yahoo Bots:
yahoo/Nutch-1.2 (yahoo; yahoo.com)
YahooBot/1.0
Mozilla/5.0 (compatible; Yahoo! Slurp;http://help.yahoo.com/help/us/ysearch/slurp)
Update rules to spot these fakes.
clean version with several adjustments in the Bot.php
file: https://github.com/summercms/sc-parser-module/pull/172/files
See github issue: https://github.com/WhichBrowser/Parser-PHP/issues/568
Yahoo! Slurp Bot
All these combinations are being used right now!
and
and
and
and
Yahoo! Slurp China Bot
Yahoo! Cache System Bot
Yahoo! Japan Bot (Y!J-BRW)
Yahoo! Japan Bot (Y!J-ASR)
Yahoo! Japan Bot (Y!J-SRD)
Yahoo! Seeker Testing Bot (2015 - 2019)
This bot has escaped on to the internet as it says
IE5.5
andMozilla 4.0
keep tracking this in the repo.Yahoo! Link Preview Bot
Yahoo! Mail Proxy Bot
Yahoo! Image Bot
Yahoo! Ad Monitoring
Desktop user agent
Mobile user agent
Link: https://help.yahoo.com/kb/yahoo-ad-monitoring-SLN24857.html
Reverse DNS
Dead Bots
Y!J-ASR/1.0 (2014 - 2015)
Y!J-BRJ/YATS (2012 - 2014)
Y!J-BRO/YFSJ (2011 - 2014)
YahooYSMcm/3.0.0 (2014)
Y!J-BSC/1.0 (2009 - 2014)
Yahoo! Slurp (2014)
Y!J-BRW/1.0 (2011 - 2013)
Y!J-BRI/0.0.1 (2009 - 2012)
Yahoo-MMCrawler/4.0 (2009 - 2010)
Yahoo! Site Explorer Feed Validator (2010)
YahooSeeker (2009)
Yahoo! Blogs (2007)
Link: https://web.archive.org/web/20070208072346/http://help.yahoo.com/help/us/ysearch/crawling/crawling-02.html
The user agent says
IE5.5
way too old.Yahoo! Slingstone / Yahoo! Link Expander (2013)
Link: https://web.archive.org/web/20140217025511/https://www.webmasterworld.com/search_engine_spiders/4629974.htm
Test Link: https://thadafinser.github.io/UserAgentParserComparison/v5/user-agent-detail/44/fe/44fefeee-7071-4bf1-8348-02d49f661776.html
Yahoo! Video (2005 - 2015)
The bot
Yahoo-MMAudVid
has not been used for years now! The user agent saysIE7
way too old.Yahoo! Mindset (2005)
Link: https://www.askdavetaylor.com/whats_yahoo_mindset/
The UA was:
Yahoo! Product Search (2005 - 2008)
Link: https://corsodicrm.files.wordpress.com/2008/05/seo-web-developer-cheat-sheet.pdf
The user agent says
IE5.5
way too old.Yahoo! Pipes (2015)
UA says
Firefox/3.5.2
way too old.As per Wikipedia:
Defunct as of June 30, 2015
Link: https://en.wikipedia.org/wiki/Yahoo!_PipesYahoo! Japan (1995)
Remove this:
The above regex finds the following UA combinations:
UA contain
Mozilla/4.0
andIE 5
andWindows 95
way too old.Yahoo! Video Search (2007 - 2008)
Remove this:
Finds this UA from 2007:
Yahoo! Site Checker (2001)
Remove this:
The above regex finds a user agent that was crawling the web when
IE6
andNetscape 5.x
was popular.Yahoo! Feed Seeker (1999)
Remove this:
Finds the following UA:
UA contain
Mozilla/4.0
andIE 5.5
way too old.Dead bots are not going to be added to this pr and removed from this repo.
Link: https://www.yahoo.com/
Link: https://www.yahoo.co.jp/
Link: https://help.yahoo.com/kb/search-for-desktop/SLN22600.html?impressions=true