While BreakIterator provides great low-level functionality for iterating forward and backward through breaks, it would be great if there were a simple way to do forward-only operations on string, StringBuilder, and char[].
We would ideally create a different extension method (with overloads for optional culture) for all 4 modes:
Word
Sentence
Line
Character
We could then expand on this to do a higher level operation, such as providing an IEnumerable<string> that would tokenize the text so it can be iterated with a foreach loop.
foreach (var word in theText.ToWords(new CultureInfo("th-th")))
{
// consume each word
}
Some thought needs to be given to thread safety, since BreakIterator requires a separate clone for each thread.
After an attempt was done on this, it is more complicated than was first envisioned because the definition of what qualifies as a "word" could vary. Need to rethink the approach.
While
BreakIterator
provides great low-level functionality for iterating forward and backward through breaks, it would be great if there were a simple way to do forward-only operations onstring
,StringBuilder
, andchar[]
.Or
We would ideally create a different extension method (with overloads for optional culture) for all 4 modes:
We could then expand on this to do a higher level operation, such as providing an
IEnumerable<string>
that would tokenize the text so it can be iterated with aforeach
loop.Some thought needs to be given to thread safety, since
BreakIterator
requires a separate clone for each thread.