c-cube / printbox

print nested boxes, lists, arrays, tables in several formats
https://c-cube.github.io/printbox/
BSD 2-Clause "Simplified" License
77 stars 10 forks source link

Better Unicode advice #4

Closed dbuenzli closed 5 years ago

dbuenzli commented 5 years ago

The advice here is very inaccurate, unless you only care about US-ASCII, counting scalar values will not give you good a visual length, please do not dispel such an idea.

Better advice would be to either count the grapheme clusters (e.g. using Uuseg_string) and/or accumulate on the scalar values @pqwy's carefully crafted Uucp.Break.tty_width_hint whose doc string also gives a nice overview of the challenges for terminal width measurements.

c-cube commented 5 years ago

I'm trying to improve that, and stumbled upon the following:

let str = "aéo\nπ/2\nτ/4";;
Uutf.String.fold_utf_8 ~pos:5 ~len:4 (fun _ _ _ -> ()) () str;;

raises

Exception: Invalid_argument "String.sub / Bytes.sub".
Raised at file "pervasives.ml", line 33, characters 25-45
Called from file "src/uutf.ml", line 54, characters 33-57
Called from file "src/uutf.ml", line 730, characters 33-52

even though String.sub str 5 4 works fine. Am I using it wrong?

dbuenzli commented 5 years ago

Humpf, looking at uutf's repo the latest commit seems to be a fix about that which was made a year ago... It seems it never got into a release.

If you can confirm me your problems are gone with:

opam pin uutf --dev

I'll gladly push out a release.

c-cube commented 5 years ago

Yes, it does now work on branch fix-4, with a new test. I used uutf+uucp.