adah1972 / libunibreak

The libunibreak library
zlib License
173 stars 38 forks source link

utf8 and utf16 functions that output breaks per code-point #30

Closed mbechard closed 4 years ago

mbechard commented 4 years ago

I think useful variants to the utf8 and utf16 functions are versions that output a brks array that is per code-point, instead of per code-unit as the current ones do. That is, the same results that come out of the utf32 version, but allowing for utf8 and utf16 input. In some situations I want to be able to consume the output 'brks' without having to think about what my source encoding was. Seems simple to add, just an if around the loop that increments posLast and sets LINEBREAK_INSIDEACHAR, and instead just increment posLast once per iteration.

mbechard commented 4 years ago

I can create a pull request if this is something you are interested in adding to the API

adah1972 commented 4 years ago

I am not sure, but I will definitely look at a PR.