cld2 only accepts a subset of UTF-8, called "Interchange valid UTF-8"
(https://tools.ietf.org/html/rfc5198) which does not accept control
characters. (We get this error a lot in production.)
When such a character is found, bytes_found is not set, so use the
number of bytes instead, as is done in DetectLanguageCheckUTF8.
cld2 only accepts a subset of UTF-8, called "Interchange valid UTF-8" (https://tools.ietf.org/html/rfc5198) which does not accept control characters. (We get this error a lot in production.)
When such a character is found, bytes_found is not set, so use the number of bytes instead, as is done in DetectLanguageCheckUTF8.