ethercreative / seo

SEO utilities including a unique field type, sitemap & redirect manager
MIT License
273 stars 96 forks source link

All pages have "X-Robots-Tag: none, noimageindex" header #244

Open piotrpog opened 5 years ago

piotrpog commented 5 years ago

I installed this plugin only to have sitemap functionality. But i recently noticed that it attaches X-Robots-Tag: none, noimageindex http header to all sites by default.

Why is that? Can I fix it somehow?

alexjcollins commented 5 years ago

@piotrpog Hmm, that's new to me. Do you have a site that's currently exhibiting this behaviour you could link to? Is the header on all pages or just sitemap pages?

piotrpog commented 5 years ago

@alexjcollins It is on all pages, not only these in sitemap. For a moment i cannot link to website because i disabled plugin to make my website appear in google again.

bymayo commented 5 years ago

@alexjcollins We're also having this issue, you can see it on the header - https://jonesfoster.com

Our hosting provider came back with:

# grep -rnw /xxx/xxx-e 'X-Robots-Tag'
/xxx/vendor/ether/seo/src/services/SeoService.php:18:  * Adds the `X-Robots-Tag` header to the request if needed.

So something on that line is adding it on. SEO Plugin is 3.4.4, but can't update to latest just yet.

bymayo commented 5 years ago

I've updated to Craft 3.3.3 and SEO 3.6.2 and the problem is still there.

Just for clarification, this our client pointed this out when they tried to put their site in to Google Search Console and it wouldn't index any pages.

EDIT: It seems it only applies when dev mode is on! 🤦‍♂️Case closed... But might be worth mentioning this in docs.

rolfkokkeler commented 4 years ago

I had the same issue. We changed production to devmode true just to see quickly what the exact error was. We changed back to devMode false, and cleared the cache. Somehow however this did not remove the header. Unfortunately we did not notice this, our client experienced a drop in ranking and notified us.

It may be safer to not switch on devMode but perhaps on environment settings, anything less then 'production' perhaps.

puck3000 commented 4 years ago

it looks like i have the same problem: site was on dev mode, then changed to production via .env file. But: robots.txt is still set to User-agent: * Disallow: /

how did you manage to apply the change to prod ?

puck3000 commented 4 years ago

ps: site is https://www.anis.ch

alexjcollins commented 4 years ago

@puck3000 Is the site definitely in production mode? Also, what does your system Robots setting look like? Here's the default for reference:

Screenshot 2020-05-27 at 19 08 22

puck3000 commented 4 years ago

@alexjcollins thank you for caring ;-) yes, the site definitely is in production, the "ENVIRONMENT" Variable in .env ist set to "production". and the robots settings are untouched and look the same as on your screenshot.

alexjcollins commented 4 years ago

@puck3000 Thanks for the reply.

Okay, that's really strange – if the robots settings are identical, you should have a sitemap reference at the top of your robots.txt file.

Is there any chance that you already have a physical robots.txt file in /web that could be overriding the plugin generated version?

puck3000 commented 4 years ago

hy alex I checked and no, there's no robots.txt file. Then I tried to add one, and strangely, even if I add a "phisical" robots.txt to the web root, I still see

User-agent: * Disallow: / on anis.ch/robots.txt ... When I place another file, like text.txt in the web root, it just works as it should.

Is there any other place, where this "wrong" robots.txt could be generated?

On Mai 28 2020, at 8:32 am, Alex Collins notifications@github.com wrote:

@puck3000 (https://github.com/puck3000) Thanks for the reply. Okay, that's really strange – if the robots settings are identical, you should have a sitemap reference at the top of your robots.txt file. Is there any chance that you already have a physical robots.txt file in /web that could be overriding the plugin generated version? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub (https://github.com/ethercreative/seo/issues/244#issuecomment-635135522), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AAZ2M3IFJQ53TQNO3VKM6QLRTYAONANCNFSM4IPJSNPA).

alexjcollins commented 4 years ago

@puck3000 When in production mode, do you have devMode set to true in config/general.php?

puck3000 commented 4 years ago

no, it is only set in dev mode:


<?php
/**
* General Configuration
*
* All of your system's general configuration settings go in here. You can see a
* list of the available settings in vendor/craftcms/cms/src/config/GeneralConfig.php.
*
* @see \craft\config\GeneralConfig
*/

return [
// Global settings
'*' => [
// Default Week Start Day (0 = Sunday, 1 = Monday...)
'defaultWeekStartDay' => 1,

// Whether generated URLs should omit "index.php"
'omitScriptNameInUrls' => true,

// Control Panel trigger word
'cpTrigger' => 'admin',

// The secure key Craft will use for hashing and encrypting data
'securityKey' => getenv('SECURITY_KEY'),

// Whether to save the project config out to config/project.yaml
// (see https://docs.craftcms.com/v3/project-config.html)
'useProjectConfigFile' => false,
],

// Dev environment settings
'dev' => [
// Dev Mode (see https://craftcms.com/guides/what-dev-mode-does)
'devMode' => true,
],

// Staging environment settings
'staging' => [
// Set this to `false` to prevent administrative changes from being made on staging
'allowAdminChanges' => true,
],

// Production environment settings
'production' => [
// Set this to `false` to prevent administrative changes from being made on production
'allowAdminChanges' => true,
],
];

should I set it explicitly to false in production?

On Mai 28 2020, at 12:36 pm, Alex Collins notifications@github.com wrote:

@puck3000 (https://github.com/puck3000) When in production mode, do you have devMode set to true in config/general.php? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub (https://github.com/ethercreative/seo/issues/244#issuecomment-635260483), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AAZ2M3NZUPQLUVRFRP3TT53RTY5EHANCNFSM4IPJSNPA).

alexjcollins commented 4 years ago

@puck3000 Might be worth giving it a go, although I'm pretty sure it'll be false by default.

puck3000 commented 4 years ago

@alexjcollins sadly you where right, setting it explicitly didn't change anything...

alexjcollins commented 4 years ago

@puck3000 It’s a big ask, but is there any possibility of sending over your site files and a database dump?

If you can, please could you send to alex@ethercreative.co.uk

uaextension commented 4 years ago

Possibly having the same issue with X-Robots-Tag: none, noimageindex being applied, but cannot find the source. Is there something I can lookup/change to remove the tag?

charlietriplett commented 4 years ago

I had the same issue. ENVIRONMENT="production" However, if devMode="true" in general.php it activated the X-Robots-Tag!!!

I ran into this issue while migrating content and devMode was true in production while I was troubleshooting and left it on in case something came up.

This resulted in about 60 important pages being unindexed over a couple of days.

I tested with https://search.google.com/search-console in both settings and found this to be the culprit.

I'm now wary, but it would be nice to have the option to ignore the X-Robots-Tag based in the general.php config settings.. Was this a holdover from CraftCMS 2?

RyanRoberts commented 3 years ago

I've also discovered this bug, for me it was because I had ENVIRONMENT=live instead of ENVIRONMENT=production. That's a pretty severe bug for an SEO plugin to have.

bertoost commented 3 years ago

Facing this issue as well.. I used to have dev, staging and prod instead.. this plugin literally checks on production. Not good.

src/services/SeoService.php :: 26
if (CRAFT_ENVIRONMENT !== 'production')
{
    $headers->set('x-robots-tag', 'none, noimageindex');
    return;
}

Neither a plugin of Craft itself should do this hard-coded.

jamiematrix commented 2 years ago

Thank you @bertoost ! Was trying to look for a solution where Google was reporting the x-robots-tag was stopping a client site from being crawled. Our env was set to prod

ineghi commented 2 years ago

Even removing

if (CRAFT_ENVIRONMENT !== 'production')
{
    $headers->set('x-robots-tag', 'none, noimageindex');
    return;
}

is not solving the issue. x-robots-tag is then set to none

jesuismaxime commented 2 years ago

Still got that issue with Craft CMS 4 version.

// services/SeoService.php

$env = getenv('ENVIRONMENT') ?? getenv('CRAFT_ENVIRONMENT');

If I use CRAFT_ENVIRONMENT and not ENVIRONMENT $env return false form the line above. So the header is set to block robots.

If I include both (which is rediculous), as above, now it works.

ENVIRONMENT=production
CRAFT_ENVIRONMENT=production

Why that condition is not based on devMode or disallowRobots Craft config settings? That way you don't have to include specific environmental settings that may be different from one dev to another.

pascalminator commented 2 years ago

Still got that issue with Craft CMS 4 version.

// services/SeoService.php

$env = getenv('ENVIRONMENT') ?? getenv('CRAFT_ENVIRONMENT');

If I use CRAFT_ENVIRONMENT and not ENVIRONMENT $env return false form the line above. So the header is set to block robots.

If I include both (which is rediculous), as above, now it works.

ENVIRONMENT=production
CRAFT_ENVIRONMENT=production

Why that condition is not based on devMode or disallowRobots Craft config settings? That way you don't have to include specific environmental settings that may be different from one dev to another.

You just saved my life. Adding ENVIRONMENT=production on top of CRAFT_ENVIRONMENT=production fixed it for me.

jesuismaxime commented 2 years ago

@pascalminator you're welcome! Still, I would like a follow up from the creators 😆

SkermBE commented 2 years ago

I'm still having this issue.. the pages are not being found. Pretty big issue! Those are my configs:

.env

ENVIRONMENT=production CRAFT_ENVIRONMENT=production

config/general.php

Screenshot 2022-09-07 at 09 20 52

Robots settings inside SEO plugin:

Screenshot 2022-09-07 at 09 22 09

Running all of this on: Craft CMS: v4.2.3 ether/seo: v4.0.3

When surfing to my domain.com/robots.txt I still get this:

User-agent: * Disallow: /cpresources/ Disallow: /vendor/ Disallow: /.env

Screenshot 2022-09-07 at 09 16 36
jamiematrix commented 2 years ago

@SkermBE

The URL next to "Referring Page" in your page indexing screenshot is indeed blocking Googlebot:

User-agent: Googlebot
Disallow: /?*

User-agent: Baiduspider
Disallow: /?*

User-agent: YandexBot
Disallow: /?*

User-agent: ichiro
Disallow:  /?*

User-agent: sogou spider
Disallow:  /?*

User-agent: Sosospider
Disallow: /?*

User-agent: YoudaoBot
Disallow: /?*

User-agent: YetiBot
Disallow: /?*

User-agent: bingbot
Crawl-delay: 2
Disallow: /?*

User-Agent: Yahoo! Slurp 
Crawl-delay: 2
Disallow: /?*

User-agent: rdfbot
Disallow: /?*

User-agent: Seznambot 
Request-rate: 1/2s
Disallow: /?*

User-agent: ia_archiver
Disallow: 

User-agent: Mediapartners-Google
Disallow: 

Is this the correct domain?

When surfing to my domain.com/robots.txt I still get this: User-agent: * Disallow: /cpresources/ Disallow: /vendor/ Disallow: /.env

The SEO settings screenshot show this is correct as you're in production mode

SkermBE commented 2 years ago

@jamiematrix

I know nothing about thad referring page.. When i surf to my own domain (witch is not 4rank.bid or some weird thing) I get the robots.txt like mentioned.

User-agent: *
Disallow: /cpresources/
Disallow: /vendor/
Disallow: /.env

But still Google search console is saying it's being blocked. When doing a live test now, this is the result:

Screenshot 2022-09-07 at 14 39 31
Arno-Ramon commented 2 years ago

Seems to be fixed with: https://github.com/ethercreative/seo/issues/432

BigglesZX commented 2 years ago

Just got mega burned by this. Thanks to those who found and submitted PRs.

cstudios-slovakia commented 1 year ago

Here is hotfix for the template, but it does not solve the bug, just suppresses the symptoms:

{% if craft.app.config.env == 'production' %}
   {% header "X-Robots-Tag: all" %}
{% else %}
   {% header "X-Robots-Tag: noindex, nofollow, none" %}
{% endif %}
RyanRoberts commented 1 year ago

Got hit with this too the other day, again.

klick commented 1 year ago

Bug ist still there. Thank you @jesuismaxime!

kevinmu17 commented 1 year ago

sorry ether, but the level of support for this plugin is starting to get ridiculous. We also have big sites hit by this issue, also, AGAIN. Much developers offered you money or support or making PR's. But it seems you leave us in the dark here.

zzseba78 commented 1 year ago

Same issue here using Craft 4 ( latest ) and SEO plugin 4.0.3 ( latest ), Google not indexing any page because of x-robots-tag: none, noimageindex.

CRAFT_ENVIRONMENT=production DISALLOW_ROBOTS=false DEV_MODE=false

This could ruin any website SEO strategy. Using latest version, no solution for this yet??

In this case

Fixed with by adding "ENVIRONMENT" variable https://github.com/ethercreative/seo/issues/432

CRAFT_ENVIRONMENT=production ENVIRONMENT=production

RyanRoberts commented 1 year ago

The plugin is now labeled as no longer maintained https://plugins.craftcms.com/seo

I'd recommend SEOMate.

zzseba78 commented 1 year ago

The plugin is now labeled as no longer maintained https://plugins.craftcms.com/seo

I'd recommend SEOMate.

Seems like they are resuming the work on the plugin: https://github.com/ethercreative/seo/issues/447#issuecomment-1498974519