Anders429 / substring

A substring method for string types.
Apache License 2.0
11 stars 1 forks source link

Support Different Kinds of Substrings #10

Open Anders429 opened 3 years ago

Anders429 commented 3 years ago

There are different kinds of substrings that can be supported by this library. Currently, the implementation supports substrings with respect to chars, but some users will likely want substrings with respect to graphemes instead. Word and sentence substrings could also be supported using the relevant unicode standards.

Altogether, I see the following substring variants being possible:

Since we are already looking at a breaking change with #9, the Substring trait can be renamed to CharSubstring (so there is no ambiguity between substring variants). The unicode-segmentation variants (grapheme, word, and sentence) can be guarded behind a unicode feature (or perhaps separate features for each?). The byte variant can be held off on for now, since it really isn't needed and presents issues with properly-formed strings.

This solution will give maximum clarity as to what this crate offers, and will give flexibility for users to choose from the various types of substrings offered.

Anders429 commented 3 years ago

WordSubstring and SentenceSubstring are both blocked, since the needed unicode_word_indices() and unicode_sentence_indices() are not currently available in unicode-segmentation. For now, just char and grapheme variants will be supported.