dyne / Zenroom

Embedded no-code VM executing human-like language to manipulate data and process cryptographic operations.
https://dev.zenroom.org
GNU Affero General Public License v3.0
196 stars 64 forks source link

Support UTF8 in all string manipulation functions #434

Open jaromil opened 2 years ago

jaromil commented 2 years ago

HEAP buffer contents (values) should support UTF8 and adopt UTF8 compliant string functions when modifying string data (see CODEC).

the CODEC itself may inlclude utf8 or even utf16 property to classify such memory contents.

jaromil commented 2 years ago

Verified that UTF8 is already supported through all default Lua functions and OCTET conversion to/from string:

[*] Interactive console, press ctrl-d to quit.
print'⭐'
⭐
[*] Script successfully executed
ut = '⭐'
[*] Script successfully executed
print(ut)
⭐
[*] Script successfully executed
u = O.from_string('⭐')
[*] Script successfully executed
print(u)
4q2Q
[*] Script successfully executed
print(u:string())
⭐
[*] Script successfully executed

Only string manipulation functions (strtok, split...) should be re-implemented to support utf8

This project may come handy: https://github.com/sheredom/utf8.h