To trigger any content-conversion logic, the HtmlParser::parse() method needs to be called with input parameter presented as List<int> or Uint8List. Otherwise, when it's given as a string it will be always assumed as UTF-8 encoded, thus giving wrong texts.
Data above is currently ignored by HtmlParser even if passed as List<int>. Internally ContentAttrParser::parse() reads the unquoted charset content as an empty string.
Encoding-detection assumes it's located within first 512 bytes and this limit can't be changed via any parameter, still leading to meta tag skipped in some cases.
Even, if the buggy behavior is fixed, code crashes later in html_input_stream.dart method _decodeBytes() as currently only UTF-8 and ASCII encodings are supported. I understand, that those are only two supported by Dart by now, but even there is no way to inject a own/custom decoder to handle this encoding and code ends up with ArgumentError.
Localized web-page containing following tag within its
head
won't be correctly decoded:And there are few problems actually:
HtmlParser::parse()
method needs to be called withinput
parameter presented asList<int>
orUint8List
. Otherwise, when it's given as astring
it will be always assumed as UTF-8 encoded, thus giving wrong texts.HtmlParser
even if passed asList<int>
. Internally ContentAttrParser::parse() reads the unquotedcharset
content as an empty string.meta
tag skipped in some cases.html_input_stream.dart
method _decodeBytes() as currently only UTF-8 and ASCII encodings are supported. I understand, that those are only two supported by Dart by now, but even there is no way to inject a own/custom decoder to handle this encoding and code ends up withArgumentError
.