AngleSharp / AngleSharp.Css

:angel: Library to enable support for cascading stylesheets in AngleSharp.
https://anglesharp.github.io
MIT License
72 stars 34 forks source link

old Encoding problem how to solove? #177

Closed sgf closed 3 weeks ago

sgf commented 4 weeks ago

Prerequisites

Description

iso-2022-cn' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method. Arg_ParamName_Name

Steps to Reproduce

read a gb2312 charset html like my screenshot

Expected Behavior

ParseDocument success

Actual Behavior

image

Possible Solution / Known Workarounds

No response

sgf commented 4 weeks ago

im not sure why AngleSharp Throw the Exception. there maybe has two reasons:

  1. AngleSharp detect current OS Encoding is "iso-2022-cn".so throw the exception.
  2. AngleSharp detect the html document meta charset is "iso-2022-cn".so throw the exception.

For Reason 1: i have no idea.

For Reason 2:

  if the paramenter is some thing like byte,then AngleSharp Should be detect the encoding.and maybe need to check the meta(charset).

  by i have read the bytes by Enocoding(custom) to a  .net String(UTF-16).
  so im put is .net string as PreaseDocument paramenter.
  i hope when AngleSharp direct call PreaseDocument(HtmlString) don't to check the charset meta in html.

  because the programer has been do that by them self(this .net string has been UTF-16 encoding).
sgf commented 3 weeks ago

soloved,just need ignore the exception. exception reason:AngleSharp trying to Create a IOS-2022-CN Encoding,but .net can't do that.