Robots.txt is too restrictive

mscherer commented 6 years ago

Following last issue ( #154 ), I tried to run linkchecker, but it seems to fail due to the current robots.txt, who seems to be slightly restrictive:

 $ curl https://www.gluster.org/robots.txt
User-agent: *
Disallow: /

Could we open it it up a bit for linkchecker to work ?

mscherer commented 6 years ago

So a side effect of the restriction is also that Google and others are likely to remove the website from their index. Can it be given a bit more consideration ?

mscherer commented 6 years ago

Still valid

amye commented 6 years ago

This is set by the hosting company. What would be the preferred setting?

mscherer commented 6 years ago

it is up to the person responsible for the web presence to decide what should be indexed or not.

But looking at wpengine documentation: https://wpengine.com/support/read-use-robots-txt/ I would go for the 2nd.

Or following another resource ( http://www.robotstxt.org/robotstxt.html ), maybe:

User-agent: *
Disallow:

is the simplest.

Right now, the fact that Google do not index the website reduce outreach for the blog, and likely worsen the issue of doc not being well found as reported on https://github.com/gluster/community/issues/17

amye commented 6 years ago

Updated, disallowed from the wp-admin page.

gluster / glusterweb

Robots.txt is too restrictive #155