Closed koheiw closed 6 years ago
Interesting, thanks. But again - I wonder if this will be the case with the latest ICU. Perhaps they already updated the Unicode Character Database...
I guess stri_replace_all_regex(txt, "[\\p{Z}\\p{C}\\p{S}\\p{P}\\p{M}]", ' ')
almost doest the trick.
The problem is with the "nose" element, which is http://www.fileformat.info/info/unicode/char/0296/index.htm
I got an interesting text on the internet which contains a lot of non-printing characters. I tried to clean it using
stri_replace_all_regex
, but did not work. This seems like a bug.Probably for the same reason, these do not return anything.