Closed ageron closed 3 weeks ago
The policy is usually to discus first on the Exercism forum, as it is the most active place for such issues with most amount of eyeballs. If you leave it here it'll probably go nowhere, so head on over to the forum and start a new thread.
The instructions of the
micro-blog
exercise say:I understand that we want to keep things simple, but I think this is misleading. For example, in the Roc track the instructions led some people to split the string into codepoints when in fact there's actually a very simple function to split the string into graphemes instead: the tests pass in both cases because they only include graphemes composed of a single codepoint, but they would fail if the tests included flags, or characters with multiple diacritics, or complex emojis, or basically any grapheme composed of multiple codepoints (i.e., extended grapheme clusters).
In short: we shouldn't encourage people to work with codepoints when they can just as easily work with graphemes.
I suggest at least updating the instructions to cover graphemes, but also including some tests with extended grapheme clusters. If we're going to handle unicode, we should try to handle all possible characters. Handling graphemes might be harder in some languages, but in that case they can just disable the extended grapheme tests.
Edit: I'm happy to submit a PR if there's an agreement on this issue.