SarsTW / sitemap-generators

Automatically exported from code.google.com/p/sitemap-generators
0 stars 0 forks source link

Migrated feature: "Obey robots.txt" submitted by cybersaga on 2005-06-03 #16

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Original feature listed here:
http://sourceforge.net/tracker/index.php?func=detail&aid=1214402&group_id=137793
&atid=739386

As of this release, there are three ways to generate a
sitemap: specifying urls, specifying paths, or using logs.

However, I would image that many administrators will
use these methods to mimic exactly what their
robots.txt file specifies. Why not make this easier?

Something like this:
<robots url="http://www.example.com/robots.txt"
path="/var/www/html" bot="googlebot />

url: Address of robots.txt.

path: Root path of the site.

bot: Interest was shown in other search engines using
this software. This attribute will allow the sitemap
generation to follow the rules for a certain bot.
Values would include a bot name, or "*" to follow all
rules, regardless of the bot they are meant for.

This would essentially mimic a directory element, and a
few filter elements based on the rules within robots.txt.

Details will have to be ironed out, taking into account
aliased directories that a bot would see, but not
visible on the file system.

Thus, creation of the sitemap will follow the same
rules a bot would.

Original issue reported on code.google.com by api.ma...@gmail.com on 13 Aug 2007 at 7:48