getnikola / nikola

A static website and blog generator
https://getnikola.com/
MIT License
2.62k stars 450 forks source link

Duplicate content in sitemap #836

Closed da2x closed 11 years ago

da2x commented 11 years ago

XML sitemaps should not contain duplicate content. Search engines use sitemaps as a parameter for determining the preferred (canonical) location of content.

Nikola includes these duplicates:

http://www.example.com/ http://www.example.com/index.html http://www.example.com/2013/ http://www.example.com/2013/index.html

I suggest including only the full version (/index.html). Use the shorter version (/) if the STRIP_INDEXES option is set to True.

Rational for preferring full address is that the popular servers will return it in their Location header, and that it is easier to setup. (It actually requires no setup other than a running server.) STRIP_INDEXES also defaults to False.

The SITEMAP_INCLUDE_FILELESS_DIRS option seems to have no effect on these duplicates.

da2x commented 11 years ago

Patch coming up.

Kwpolska commented 11 years ago

PS. it would be nice to use close/closed/closes/fix/fixed/fixes/resolve/resolved/resolves #836 in your commit messages (choose one of those nine words, whichever is your favorite, the Grammar Nazi in me suggests those ending with s those not ending with d, some people dislike *s as well)