gohugoio / hugo

The world’s fastest framework for building websites.
https://gohugo.io
Apache License 2.0
74.78k stars 7.46k forks source link

sitemap.xml as a plain text - bug in declaration in urlset #10515

Open poizon opened 1 year ago

poizon commented 1 year ago

What version of Hugo are you using (hugo version)?

$ hugo version
hugo v0.104.3-58b824581360148f2d91f5cc83f69bd22c1aa331+extended linux/amd64 BuildDate=2022-10-04T14:25:23Z VendorInfo=gohugoio

Does this issue reproduce with the latest release?

I found bug in default sitemap template (for multilingial sites) After generate sitemap (i.e https://example.org/en/sitemap.xml) and open it in browser - you see xml as a plain text, because urlset declaration is not valid

Solution

You need add to sitemap.xml template in urlset tag this declarations: <urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd http://www.w3.org/TR/xhtml11/xhtml11_schema.html http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/TR/xhtml11/xhtml11_schema.html">

jmooring commented 1 year ago

The problem is a 301 redirect with:

http://www.w3.org/1999/xhtml
wget http://www.w3.org/1999/xhtml

--2022-12-09 15:20:18-- http://www.w3.org/1999/xhtml Resolving www.w3.org (www.w3.org)... 104.18.23.19, 104.18.22.19, 2606:4700::6812:1613, ... Connecting to www.w3.org (www.w3.org)|104.18.23.19|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: https://www.w3.org/1999/xhtml [following] --2022-12-09 15:20:18-- https://www.w3.org/1999/xhtml Connecting to www.w3.org (www.w3.org)|104.18.23.19|:443... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: https://www.w3.org/1999/xhtml/ [following] --2022-12-09 15:20:18-- https://www.w3.org/1999/xhtml/ Reusing existing connection to www.w3.org:443. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘xhtml’

idarek commented 1 year ago

The default sitemap template for sitemap.xml in Hugo contain

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:xhtml="http://www.w3.org/1999/xhtml">

Changing it just to

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:xhtml="http://www.w3.org/TR/xhtml11/xhtml11_schema.html">

adding the way of displaying sitemaps in subfolders exactly like the root one.


As mentioned here: https://stackoverflow.com/questions/16798979/xsd-for-sitemap-with-hreflang

but then Google Search Console will start complaining about Incorrect namespace.

Firstly it's better to think if we need to fix that. I don't think we need it. Google search engines are reading these files, not users. I understand that users may want to preview that, but overall I am sticking with defaults.

jmooring commented 1 year ago

@idarek The PR I submitted (#10516) fixes this trivial redirect problem, and adheres to both the sitemap protocol and Google's recommendations for multilingual sites.

idarek commented 1 year ago

Works locally. Fails in production (Cloudflare Pages).

Work on both on my end. I copied templates from Hugo GitHub repo and created in my layout all works fine now.

McShelby commented 1 year ago

I also commented on the changeset:

A XML namespace is usually not meant for browsing. The expected value should be using a http:// prefix as mentioned in the docs.

Switching to https:// may result in unexpected behaviour (eg. tools that consume those XML files may have trouble with this).

jmooring commented 1 year ago

@McShelby

I agree. Given that the host + path is unique, I incorrectly assumed that the protocol was irrelevant. Looking at the spec, it is indeed a full string comparison.

John Mueller of Google, in this thread, states:

Yes, it's essentially an identifier. Google accepts both, since people tend to use them interchangeably nowadays.

But just because Google does, it does not mean that everyone else does. I will revert https://github.com/gohugoio/hugo/commit/3fd0b78498597ceb343b7fda2e9b652f3e957478.

In the future, please open a new issue instead of commenting on a commit or a closed issue. Thanks.