The standard specifies a string as a set of sequential characters and requires UTF-8 encoding.
When it comes to the LEN function, and other functions requiring a string length, the standard is vague on the exact requirements, but it talks about a number of characters. This means that a LEN of a string will not be the size of bytes this string has (or WORDS for WSTRING) but the amount of UTF characters it contains.
The same applies for functions like LEFT so a LEFT(myString,10) will trim to the first 10 characters, not the first 10 bytes.
Is this behaviour correct?
This is what the current branch 11-str does.
For Reference Rust uses bytes when working with similar functions, where their implementation of String.length is O(1).
Our implementation in the current design is O(N) as we need to find the null terminator
The standard specifies a string as a set of sequential characters and requires UTF-8 encoding. When it comes to the LEN function, and other functions requiring a string length, the standard is vague on the exact requirements, but it talks about a number of characters. This means that a
LEN
of a string will not be the size of bytes this string has (or WORDS for WSTRING) but the amount of UTF characters it contains. The same applies for functions like LEFT so aLEFT(myString,10)
will trim to the first 10 characters, not the first 10 bytes. Is this behaviour correct? This is what the current branch 11-str does.For Reference Rust uses bytes when working with similar functions, where their implementation of
String.length
is O(1). Our implementation in the current design is O(N) as we need to find thenull
terminator@riederm ideas?