britzl / gooey

Defold GUI system
MIT License
150 stars 22 forks source link

Trying to erase none utf8 characters throw errors #55

Open Jerakin opened 5 years ago

Jerakin commented 5 years ago

Easily reproduced with a emoji such as 😄 while typing on a phone.

@britzl How do you think we should handle this?

ERROR:SCRIPT: /gooey/internal/utf8.lua:128: Invalid UTF-8 character
stack traceback:
    [C]: in function 'error'
    /gooey/internal/utf8.lua:128: in function 'utf8charbytes'
    /gooey/internal/utf8.lua:190: in function 'utf8len'
    /gooey/internal/utf8.lua:207: in function 'sub'
    /gooey/internal/input.lua:168: in function 'input'
    /gooey/gooey.lua:162: in function 'input'
    /screens/change_pokemon/edit/edit.gui_script:150: in function </screens/change_pokemon/edit/edit.gui_script:146>
britzl commented 5 years ago

What happens when you type an emoji? I assume the font you use doesn't render it or do you actually have a font that renders the emoji?

britzl commented 5 years ago

I tried the utf8 module in the same way as the backspace key is handled and it generates the correct result:

assert(utf8.sub("a😄b", 1, -2) == "a😄")
assert(utf8.sub("a😄", 1, -2) == "a")
Jerakin commented 5 years ago

Odd. Doing the same within Gooey gives me this call stack.

ERROR:SCRIPT: /gooey/internal/utf8.lua:128: Invalid UTF-8 character
stack traceback:
    [C]: in function 'error'
    /gooey/internal/utf8.lua:128: in function 'utf8charbytes'
    /gooey/internal/utf8.lua:226: in function 'sub'
    /gooey/internal/input.lua:153: in function 'input'
    /gooey/gooey.lua:162: in function 'input'
    /example/dirtylarry.gui_script:45: in function 'fn'
    /gooey/gooey.lua:184: in function 'group'
    /example/dirtylarry.gui_script:16: in function </example/dirtylarry.gui_script:15>

What I am doing is super straight forward. Simply inputting a smiley triggers it in this case.

  1. Open up dmengine.apk
  2. Target phone and build
  3. Go to dirty larry
  4. Click top input box, switch to the "emoji keyboard"
  5. Type emoji
britzl commented 4 years ago

Emojis more investigation. In Java an emoji can't be represented in a single UTF-16 character, it actually needs two. The emoji is converted into a surrogate pair, one high and one low character.

https://developers.redhat.com/blog/2019/08/16/manipulating-emojis-in-java-or-what-is-%F0%9F%90%BB-1/ http://www.unicode.org/versions/Unicode12.1.0/ch03.pdf#G2630

In Defold we simply take the codepoint of the the two UTF-16 characters and encode these to UTF-8 and forward to Lua

I'm still not sure how this should be handled to be honest and I've already spent too much time on this. Investigation has to continue later.