TryGhost / Ghost

Independent technology for modern publishing, memberships, subscriptions and newsletters.
https://ghost.org
MIT License
46.83k stars 10.2k forks source link

Urls blocked in robots.txt should not appear in the sitemap #6964

Closed vikin91 closed 8 years ago

vikin91 commented 8 years ago

The issue is very simple, I want to block some urls from being indexed so I add a respective line in robots.txt. An example may be an "Impressum" page that is required in Germany and contains private data that I don't want to be indexed. I exclude it in robots.txt but it appears in my sitemap-pages. Google search console complains that I block some urls although they exist in the sitemap.

Proposal: resources blocked in robots.txt should be excluded from sitemap.xml. This would need using a robots.txt-parser, but I bet that there exist some parsers that can do it.

ErisDS commented 8 years ago

The issue is very simple

And yet it involves a parser... 😂 😂

I think the problem is relatively simple to understand here:

I want to block some urls from being indexed

Adding something to your robots.txt is one way to do this. Right now, that's supported by adding a robots.txt to your theme and you get what you need.

Another way to do this is using the robots meta tag, which is described here as being a potentially better mechanism. In your particular case, I think this might be a better solution to the problem?

In Ghost terms - you could add a new template page-impressum.hbs which outputs the extra meta tags.

The suggestion that we should parse the robots.txt file and use that to determine what's in the sitemap breaks one of the cardinal rules: Ghost themes should not be able to change the behaviour of Ghost. They are not apps/plugins that get to control what Ghost does - they are just lightweight "skins" which handle the output of data.

To implement this, I think we'd need a different system for providing routes that are not intended to be indexed, which results in those routes being output in robots.txt and removed from sitemap.xml. E.g. we could do this at the post/page level.

However, I think at this point, this is a feature request which really belongs over on our wishlist. We use this list to help determine what goes into the roadmap and how to prioritise features.

vikin91 commented 8 years ago

I agree with you that such a parser would violate the rule (which I like) about separating template from the core.

If this goes to the wishlist, then I would phrase it differently: Add a checkbox for every post/page saying "include this page in the sitemap" or "hide this page". The checkbox would be checked by default.

ErisDS commented 8 years ago

@vikin91 On the wishlist, I would always choose to phrase things as the problem rather than the solution. That way people can add comments about similar problems, and we can design the best possible solution.

So instead, I would simply put "Add a way to block some urls from being indexed"

This is just a suggestion, but if you look at the list, problems get more traction than solutions 😉

vikin91 commented 8 years ago

Thanks!

Where can I read more about the correct way of achieving

add a new template page-impressum.hbs which outputs the extra meta tags?

And how can I point that a given post should be rendered using the template I want and not the other? This smells like if($url =~ /impressum/){impressum template} else{ normal template }.

ErisDS commented 8 years ago

See the docs here: http://themes.ghost.org/docs/templates#section-page-hbs

vikin91 commented 8 years ago

Ok, 1/2 works. And how the page-impressum.hbs should inject something into <head> section if ift does not contain any? Maybe add something like this into default.hbs?

{{#is "page-impressum"}}
   <!-- this should be only in impressum -->
{{/is}}

Any docs on that are missing :(

ErisDS commented 8 years ago

There's a guide on the dev blog about using content blocks. Need to get some docs added to main theme docs about this.

Gonna close this now, feel free to link the wishlist item here when you create it, and if you have any further questions about theming please head over into the #themes channel of our slack team.