Ruby v2.7.7 conflicts with Charlock_holmes converting file from euc_kr (or CP949) to uft-8

brianmario / charlock_holmes

Character encoding detection, brought to you by ICU

MIT License

1.04k stars 142 forks source link

For those who are facing a similar issue, I'd like to share that the primary challenge revolves around Charlock Holmes' inability to accurately determine the encoding with complete certainty, particularly when the file contains only a limited amount of data.

To address this, I've devised a solution that involves a logic-based approach. This entails making encoding decisions based on two key factors: the confidence level provided by Charlock Holmes and the language header that the user specifies.

By integrating these considerations, I've managed to successfully overcome the issue. This solution offers a balanced approach to accurately determining the encoding in cases where the data may be limited.

Feel free to adopt this strategy if you encounter similar encoding-related challenges and please share yours too!

brianmario / charlock_holmes

Ruby v2.7.7 conflicts with Charlock_holmes converting file from euc_kr (or CP949) to uft-8 #168