CloudCannon / pagefind

Static low-bandwidth search at scale
https://pagefind.app
MIT License
3.22k stars 97 forks source link

Broken image sources in Hugo with base64 encoded images #607

Closed paranerd closed 2 months ago

paranerd commented 2 months ago

I just found images in my search on my Hugo site to be broken. After some extensive debugging I found that the issue was likely due to parsing issues with image sources when src is in base64:

Hugo encodes

<img src="data:{{ .MediaType.Type }};base64,{{ .Content | base64Encode }}" />

into a base64 string that contains &#43; (and other &#XX; parts) instead of +.

I confirmed this by checking the files in public/

This is not a problem in any other location on the site where the existing &#XX; is "retranslated" into + just fine.

However, it breaks pagefind's meta.image which contains the raw (malformed?) base64 string.

A workaround for me was:

img['src'] = resultData.meta.image.replaceAll(/&#\d\d;/g, '+');

which brings back the images.

Nevertheless I think this may be a bug worth investigating. I'm not sure if this is something pagefind should deal with or if I should raise this at Hugo. Let me know.

Thank you!

paranerd commented 2 months ago

Interesting fact: The issue exists on my MacBook and also on an Ubuntu server but NOT on Cloudflare where I'm deploying to. All versions (Hugo and pagefind) are the same on all systems, however Cloudflare uses the go-build (MacBook uses homebrew, Ubuntu uses snap).

Maybe this helps.

paranerd commented 2 months ago

Hugo team came back with the solution.

Tell the formatter that src is a safe HTML attribute like this:

{{ with resources.Get "a.jpg" }}
  {{ $src := printf "data:%s;base64,%s" .MediaType.Type (.Content | base64Encode) }}
  <img {{ printf "src=%q" $src | safeHTMLAttr }}>
{{ end }}