The treatment of whitespace is inconsistent in DedupEndNote
DedupEndNote uses String.trim() to remove whitespace at the start of the end of strings.
String::trim uses the definition of space as any codepoint that is less than or equal to the space character codepoint (\u0020). Newer trimming methods will use the definition of (white) space as any codepoint that returns true when passed to the Character::isWhitespace predicate. (from: https://stackoverflow.com/questions/51266582/difference-between-string-trim-and-strip-methods-in-java-11). String.strip() is an example of such a Unicode aware method
String::strip: Characters within the string which are less than or equal to the space character codepoint (\u0020) are not removed, or replaced by a normal space
Record::normalizeToBasicLatin(...) removes all whitespace characters above "\u00FF". See a test in TextNormalizerTest
runs of whitespace are not reduced to one whitespace character
[X] replace trim() by strip()
[X] replace all characters which are used by trim() and strip() with a SPACE character before Record::normalizeToBasicLatin(...) is called
The treatment of whitespace is inconsistent in DedupEndNote
DedupEndNote uses String.trim() to remove whitespace at the start of the end of strings.
Record::normalizeToBasicLatin(...) removes all whitespace characters above "\u00FF". See a test in TextNormalizerTest
runs of whitespace are not reduced to one whitespace character
[X] replace trim() by strip()
[X] replace all characters which are used by trim() and strip() with a SPACE character before Record::normalizeToBasicLatin(...) is called