AviSynth / AviSynthPlus

AviSynth with improvements
http://avs-plus.net
977 stars 75 forks source link

StrLen() wrong results for non-latin text. #373

Closed Uzver123 closed 11 months ago

Uzver123 commented 1 year ago

If i use StrLen("Hello") it works good returns 5, however if i use with non latin text, results are twice as large e.g. StrLen("ЯЩЙ") returns 6, not 3. Text i used is Cyrillic

As a workaround is there way to determine weather text is Latin?

Also function UCase() is not working with non Latin characters.

pinterf commented 1 year ago

Strlen is unaware of the real letter count, it returns the byte count in your case, because I suppose utf8 encoding is used there, which consumes two bytes in your language. See also: https://forum.doom9.org/showthread.php?t=174459

As for Ucase: it works only for English characters.

Uzver123 commented 1 year ago

Is there way to check weather letters inside string are latin or non latin characters?

There seems to be no support for regular expressions (regex) in AviSynth+ to strip all non latin characters from string and i don't know any other way of checking weather string is latin or other language.

pinterf commented 1 year ago

Or try StrToUtf8/StrFromUtf8 functions before StrLen, they convert a 8 bit (Ansi) string to UTF8 and back.

pinterf commented 11 months ago

Strlen is using byte count and cannot cope with utf8 text or with other unicode special features.