GiovineItalia / Compose.jl

Declarative vector graphics
http://giovineitalia.github.io/Compose.jl/latest/
Other
248 stars 83 forks source link

Fix handling of non-UTF-8 string length for pango layout #394

Closed tomyun closed 4 years ago

tomyun commented 4 years ago

Fix for #393 as suggested by @ScottPJones.

Specific test cases for non-UTF-8 string would require import some external deps like LegacyStrings.jl or Strs.jl thus has not been made into this commit. The originally intended fix for UTF-8 unicode strings could be still picked up by a test case implemented in #393.

codecov-io commented 4 years ago

Codecov Report

Merging #394 into master will not change coverage. The diff coverage is 0.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #394   +/-   ##
=======================================
  Coverage   41.75%   41.75%           
=======================================
  Files          18       18           
  Lines        3365     3365           
=======================================
  Hits         1405     1405           
  Misses       1960     1960           
Impacted Files Coverage Δ
src/pango.jl 33.87% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7a6a0a4...2ea9dc5. Read the comment docs.

bjarthur commented 4 years ago

just to make sure i understand, we could also just simply input -1 instead of sizeof(...), right? see https://developer.gnome.org/pango/stable/pango-Layout-Objects.html#pango-layout-set-markup

also, are all the other uses of codeunits in Compose.jl okay? i think convert(String,...) might be needed in these cases too to support UTF16/32 IIUC. perhaps that's for another PR.

$ grep -r codeunits src
src/fontfallback.jl:        input = codeunits(textline)
src/pgf_backend.jl:    input = codeunits(escape_tex_chars(text))
src/pango.jl:    textarray = codeunits(text)
src/pango.jl:        input = codeunits(unsafe_string(c_stripped_text[]))
tomyun commented 4 years ago
  1. I'm not sure if Julia string byte array is null-terminated by default.

  2. For other uses of codeunits, we may need more testing. However, note that UTF16/32String from LegacyStrings.jl and UTF16/32Str from Strs.jl don't seem to be fully compatible with string-related functions we use.

julia> s = "aaβαbb"

julia> using LegacyStrings
julia> s1 = utf32(s)

julia> using Strs
julia> s2 = UTF32Str(s)
julia> codeunits(s)
8-element Base.CodeUnits{UInt8,String}:
 0x61
 0x61
 0xce
 0xb2
 0xce
 0xb1
 0x62
 0x62

julia> codeunits(s1)
7-element Base.CodeUnits{UInt32,UTF32String}:
Error showing value of type Base.CodeUnits{UInt32,UTF32String}:
ERROR: MethodError: no method matching codeunit(::UTF32String, ::Int64)

julia> codeunits(s2)
6-element Base.CodeUnits{UInt32,UTF32Str}:
 0x00000061
 0x00000061
 0x000003b2
 0x000003b1
 0x00000062
 0x00000062
julia> match(r"α", s)
RegexMatch("α")

julia> match(r"α", s1)
ERROR: ArgumentError: regex matching is only available for the String type; use String(s) to convert

julia> match(r"α", s2)
ERROR: type Regex has no field extra
bjarthur commented 4 years ago

ok. i'm fine with this then. julia doesn't support utf16/32 well so no need to go to great effort in Compose.

@ScottPJones does this look right to you?