Open bnewbold opened 1 year ago
ca-certs added in: https://github.com/bluesky-social/indigo/pull/36
Crawler user-agent with some content info included is the next highest priority, IMO.
Here are some example User-Agents, showing how contact info is included:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/W.X.Y.Z Safari/537.36
Mozilla/5.0 (compatible; AhrefsBot/5.2; +http://ahrefs.com/robot/)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
webprosbot/2.0 (+mailto:abuse-6337@webpros.com)
Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help@moz.com)
Dreamwidth Studios (webmaster@dreamwidth.org; for http://www.dreamwidth.org/users/bnewbold_links_feed/; 1 readers)
GarlikCrawler/1.2 (http://garlik.com/, crawler@garlik.com)
A couple things we should look at for bigsky, in the context of it reaching out and crawling an increasingly large number of hosts on the web.
/xrpc/...
)