StractOrg / stract

web search done right
https://stract.com
GNU Affero General Public License v3.0
2.13k stars 47 forks source link

Use more sophisticated encoding detection when utf8 decoding fails. #172

Closed mikkeldenker closed 6 months ago

mikkeldenker commented 6 months ago

Closes #137.

Some websites, especially older ones, sometimes use a different encoding scheme than utf8 or latin1. Before, we simply tried different encoding schemes until one successfully decoded the bytes but this approach can fail unexpectedly as some encodings can erroneously get decoded by other encodings without errors being reported. We now use the encoding detection crate 'chardetng' which also seems to be the one used in firefox.