CloudCannon / pagefind

Static low-bandwidth search at scale
https://pagefind.app
MIT License
3.47k stars 111 forks source link

Different encodings not handled (windows-1252) #723

Closed csaftoiu closed 3 hours ago

csaftoiu commented 3 hours ago

I have a site which for historical reasons has to use windows-1252 encoding.

The HTML page has this at the top:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<meta charset="windows-1252">

The page is properly encoded and displays correctly on web browsers, for example a long hyphen:

goes some way to answer that question — taking it step by step will be helpful

Yet on the search preview, the hyphen is not shown correctly:

image

On the search page itself I've tried both windows-1252 and utf-8 encoding, with the same result.


Related to this, a lot of the pages have this encoding without the <meta> tag at all...

What would be the place to start to dig in to make this work, if there's no settings to handle this already?

csaftoiu commented 3 hours ago

Solved by converting to utf-8 on command line before-hand