aadsm / jschardet

Character encoding auto-detection in JavaScript (port of python's chardet)
GNU Lesser General Public License v2.1
706 stars 97 forks source link

ISO 8859 not detected in this case #31

Open bpasero opened 7 years ago

bpasero commented 7 years ago

Detect attached file. The result will be windows-1252

iso-8859-1.txt

image

aadsm commented 7 years ago

iso-8859-1 is not currently supported by this library (it could be a todo though). windows-1252 is a superset of iso-8859-1 so it should be fine, what specific problem is this causing?

demetriusnunes commented 6 years ago

We're also having issues when detecting ISO-8859-1 as windows-1252. We're willing to implement iso-8859-1 support and create a PR for this. Would you point us in the right direction?

aadsm commented 6 years ago

The python version of this library implemented it this way, (https://github.com/chardet/chardet/pull/100/files), so I would probably follow the exact same logic. There’s only a small range of characters that differs from these two encodings and this change assumes one encoding until it has evidence that this range of characters is being used.

tahv0 commented 5 years ago

@demetriusnunes did you make that PR?

Is ISO-8859-1 supported yet?