Indexing strings uses code points rather than grapheme clusters

fubark / cyber

Fast and concurrent scripting.

MIT License

1.14k stars 38 forks source link

Example code:

i = '👨‍👨‍👦‍👦c'.indexChar('c')
print 'Found char at {i}.' -- Found char at 7.

Indexing strings based on codepoint doesn't really make sense with regards to how unicode is actually rendered and I think it would be a shame to bake these semantics into the language. Yes this is how Python and many other languages have built their string APIs, however some languages such as Swift seem to have taken a somewhat different approach that makes doing the wrong thing in the presence of unicode a lot harder. See this blog post for an overview: https://www.mikeash.com/pyblog/friday-qa-2015-11-06-why-is-swifts-string-api-so-hard.html

I'm not trying to tell you how best to design your language or anything like that, I just want ensure you are aware of the tradeoff here.

fubark / cyber

Indexing strings uses code points rather than grapheme clusters #10