Closed lkraav closed 1 year ago
Writing a simple integration override plugin, it dawned on me that it would be useful if magic number $blog_public 2
would be defined as a class variable, so not only inside code, but also outsiders could reference it consistently.
@lkraav could you help provide a use case where this would be useful? If I'm understanding your request correctly, then I'm unable to foresee a scenario where I'd want to restrict visitors by IP while letting search bots index the site (if that's truly what you're asking).
I'm unable to foresee a scenario where I'd want to restrict visitors by IP while letting search bots index the site (if that's truly what you're asking).
But that's exactly what's currently happening, if you filter robots
query variable to get through restriction.
Of course, with no filter in place, robots.txt queries get redirected to whatever the configuration, but I believe that's not optimal either.
@lkraav after internal discussion, our intention is to keep the functionality of Restricted Site Access as-is in relation to this issue. We'll work to update our documentation and readme files in a related PR to more clearly reflect this. Thank you for calling this to our attention, you're helping to ensure we're best representing Restricted Site Access to the community!
My goal using the Restricted Site Access plugin was to a) restrict site access based on IP address, and b) stop search engines crawling the website. I was surprised to find out that A) was working fine, but the website was still being crawled by Google. Each search result then ended in a 404. In my opinion it would be much better if the site was not crawled at all, or that you have a choice (although I cannot think of a situation where you would want your full site under restricted access, but still crawled). Do I understand that you've decided to leave things as-is? For what purpose would that be?
Seems like that Disallow
directive will be removed in WordPress itself soon: https://make.wordpress.org/core/2019/09/02/changes-to-prevent-search-engines-indexing-sites/
Perhaps we should think about the meta tag a bit more? Not sure it that's actually better covered by something like an SEO plugin though. Would be worthy of a separate issue to discuss.
As I understand, Google will remove redirected URLs from its results eventually. But to achieve that we need to allow Google to index the site, it needs to know about redirects we have. So RSA enabled sites shouldn't be controlled by robots.txt
which prevents search engines from detecting the redirections.
Let's examine all RSA options:
Send them to the WordPress login screen
and Show them a simple message
: These two pages have meta robots
set to noindex
.Redirect them to a specified web address
: The search engine visibility depends on the redirected site.Show them a page
: This page is indexable by search engines (for now). IMO, this is the only place we need to control the search engine visibility by meta tag.but the website was still being crawled by Google
For this, the only case I can think about is: the site is crawled before and URLs are cached by Google.
Goal: ability to selectively combine effects of
Expected
/robots.txt
output:It's almost like "Restricted site access" or "Discourage search engine" should become an "add-on" type checkbox, instead of "pick one" radio button.
Your thoughts?
EDIT I wonder if there is a clever way of hooking into https://github.com/WordPress/WordPress/blob/5.0.3/wp-includes/functions.php#L1314