alanorth / hugo-theme-bootstrap4-blog

A blogging-centric Bootstrap v4 theme for the Hugo static site generator.
Other
204 stars 132 forks source link

Character escape issue with schema dot org and Google Crawler #130

Closed bgidley closed 4 years ago

bgidley commented 4 years ago

The schema.org script does not handle special characters in the blog title. For example my blog is called Gidley's Gossipings and it exports

<script type="application/ld+json">
{
  "@context": "http://schema.org",
  "@type": "Blog",
  "headline": "Gidley\x27s Gossipings",
  "url" : "https:\/\/www.gidley.co.uk\/",
  "author": {
    "@type": "Person",
    "name": "Ben Gidley"
  },
  "dateModified": "2019-03-16T15:33:15\x2b00:00",
  "keywords": "fintech,old,tech,amp,attack-trees,banks,blogging,complex,crypto,cv,defense-in-depth,does-it-exist-yet,fintech,google,iot,luck,malware,me,mitm,old,payments,paytv,phone,photography,psd2,security,ssl,",
  "description": "A blog about not much really"
}
</script>

The Google crawler does not accept the \x27 escape sequence in the title and refusing to spider the page.

The JSON spec (https://www.json.org/json-en.html) does not support \x as an escape character.

This may be an issue in Hugo itself - there doesn't seem to be a way to control the escaping logic.

alanorth commented 4 years ago

@bgidley make sure you've updated your theme to at least v1.4.1. I fixed this specific issue with JSON-LD in 85abd20ba7bea1c49a16fe5f8f8481afb581641e. I just tested by modifying my own blog's title to add an apostrophe and it is no longer escaped.