Open oliverklee opened 1 week ago
We can follow what browsers do: https://developer.mozilla.org/en-US/docs/Web/CSS/@charset
We can follow what browsers do: https://developer.mozilla.org/en-US/docs/Web/CSS/@charset
Yes good idea. Though browsers have a Content-Type
header that may include a charset=
specifier that we don’t have (as well as the resolved charset of the referring document). But we can definitely follow what browsers do absent charset=
.
Though browsers have a
Content-Type
header that may include acharset=
specifier that we don’t have (as well as the resolved charset of the referring document).
We can use the value provided to Settings::withDefaultCharset
in its place.
From https://github.com/MyIntervals/PHP-CSS-Parser/pull/688#issuecomment-2330767391:
In essence: have some heuristic to determine the input encoding (BOM,
@charset
, try a few common charsets and pick the first one that doesn’t produce errors), then convert to UTF-8 and, from that point on, all the tokens of interest to us will be ASCII-only and can be parsed using regular string functions.