Closed pastakhov closed 6 months ago
Google likes to complain, I would keep things aligned with the official instructions https://www.mediawiki.org/wiki/Manual:Robots.txt , plus allowing crawling of /w/index.php?
may induce unnecessary load to the wiki which bots trying to load various diffs and history pages which are usually resource heave
I agree that images and thumbs can be whitelisted
allowing crawling of /w/index.php? may induce unnecessary load to the wiki which bots trying to load various diffs and history pages which are usually resource heave
I agree
I think it is better when crawlers can index images (Allow: /w/thumb.php? should be included also) If robots.txt does not allow access to /w/sitemap, and the crawler can't access the sitemap files.
Probably
/w/index.php?
should be allowed also. If I'm not wrong, all HTML in/w/index.php?
URL contains the<meta name="robots" content="noindex,nofollow">
tag, but crawlers want to be able to check them anyway. This a question for an SEO specialist. I just saw that Google's crawler complained about when it was not allowed to scan the pages.