Proposed fix for #546. Use BreakIterator in PlainTextTokenMaker to ensure line wrapping is done in a per-locale, language-specific basis.
Remaining issues:
Mark Occurrences marks tokens as identified, which isn't logical to users (tokens area identified based on where line breaks can occur, so stuff like foo). is marked as a single token). We'll need to either disable Mark Occurrences for plain text, or create a custom PlainTextOccurrenceMarker that looks for e.g. strings of letters within tokens (which might be very very slow).
Would need to delete PlainTextTokenMaker.flex, as PlainTextTokenMaker.java is now a hand-written TokenMaker.
Proposed fix for #546. Use
BreakIterator
inPlainTextTokenMaker
to ensure line wrapping is done in a per-locale, language-specific basis.Remaining issues:
foo).
is marked as a single token). We'll need to either disable Mark Occurrences for plain text, or create a customPlainTextOccurrenceMarker
that looks for e.g. strings of letters within tokens (which might be very very slow).PlainTextTokenMaker.flex
, asPlainTextTokenMaker.java
is now a hand-writtenTokenMaker
.