Open Anders429 opened 3 years ago
WordSubstring
and SentenceSubstring
are both blocked, since the needed unicode_word_indices()
and unicode_sentence_indices()
are not currently available in unicode-segmentation
. For now, just char
and grapheme
variants will be supported.
There are different kinds of substrings that can be supported by this library. Currently, the implementation supports substrings with respect to
char
s, but some users will likely want substrings with respect to graphemes instead. Word and sentence substrings could also be supported using the relevant unicode standards.Altogether, I see the following substring variants being possible:
Since we are already looking at a breaking change with #9, the
Substring
trait can be renamed toCharSubstring
(so there is no ambiguity between substring variants). The unicode-segmentation variants (grapheme, word, and sentence) can be guarded behind aunicode
feature (or perhaps separate features for each?). The byte variant can be held off on for now, since it really isn't needed and presents issues with properly-formed strings.This solution will give maximum clarity as to what this crate offers, and will give flexibility for users to choose from the various types of substrings offered.