Hi, we've had some issues with charset detection for some websites. Current implementation on BUbiNG lacks the appropriate regex to detect HTML5 declarations :
We've implemented it, as well as a fallback using ICU probabilistic charset detection (with a dependency on ICU).
I think HTML5 charset detection could easily be submitted to the main repo. How do you stand regarding a potential dependency on ICU ?
Hi, we've had some issues with charset detection for some websites. Current implementation on BUbiNG lacks the appropriate regex to detect HTML5 declarations :
We've implemented it, as well as a fallback using ICU probabilistic charset detection (with a dependency on ICU).
I think HTML5 charset detection could easily be submitted to the main repo. How do you stand regarding a potential dependency on ICU ?