MyIntervals / PHP-CSS-Parser

A Parser for CSS Files written in PHP. Allows extraction of CSS files into a data structure, manipulation of said structure and output as (optimized) CSS
http://www.sabberworm.com/blog/2010/6/10/php-css-parser
MIT License
1.75k stars 148 forks source link

Add a heuristic for determining the charset #706

Open oliverklee opened 1 week ago

oliverklee commented 1 week ago

From https://github.com/MyIntervals/PHP-CSS-Parser/pull/688#issuecomment-2330767391:

In essence: have some heuristic to determine the input encoding (BOM, @charset, try a few common charsets and pick the first one that doesn’t produce errors), then convert to UTF-8 and, from that point on, all the tokens of interest to us will be ASCII-only and can be parsed using regular string functions.

oliverklee commented 1 week ago

We can follow what browsers do: https://developer.mozilla.org/en-US/docs/Web/CSS/@charset

sabberworm commented 1 week ago

We can follow what browsers do: https://developer.mozilla.org/en-US/docs/Web/CSS/@charset

Yes good idea. Though browsers have a Content-Type header that may include a charset= specifier that we don’t have (as well as the resolved charset of the referring document). But we can definitely follow what browsers do absent charset=.

JakeQZ commented 1 week ago

Though browsers have a Content-Type header that may include a charset= specifier that we don’t have (as well as the resolved charset of the referring document).

We can use the value provided to Settings::withDefaultCharset in its place.