Currently landscape read output in unit of 1024bytes. When buffer contains only part of 3-byte UTF-8 character, this causes ArgumentError when matching with regex.
For example, 갱 is 3byte UTF-8 character "\xEA\xB0\xB1" and valid, but "\xEA\xB0" is invalid UTF8 string.
This PR fixes this problem by replacing invalid UTF8 character to empty string.
Thanks for the quick fix, @synthdnb. Might have made sense to simply try and buffer another KB of data rather than lose the codepoint with an empty string, but this will suffice.
Currently landscape read output in unit of 1024bytes. When buffer contains only part of 3-byte UTF-8 character, this causes ArgumentError when matching with regex.
For example,
갱
is 3byte UTF-8 character "\xEA\xB0\xB1" and valid, but "\xEA\xB0" is invalid UTF8 string.This PR fixes this problem by replacing invalid UTF8 character to empty string.