Open HertzDevil opened 2 years ago
Is it even a good idea to leave it a byte limit? Could we have overloads with character limits here?
Oh, I think this is I bug I probably introduced when we added IO#peek
and the related optimization in IO#gets
(gets_peek
). If you remove that, and also remove the overwritten IO::Memory#gets
method, it works fine.
My thoughts: we should exactly like Ruby here. In Ruby limit is bytes, but if you are in the middle of a codepoint you just continue until the end of it. You end up returning more bytes than requested, but a limit is always a soft-limit in my mind.
If the
limit
argument is passed toIO#gets
, it behaves like a hard limit and theIO
always consumes that number of bytes when the delimiter is not found. This means it may stop in the middle of a UTF-8 byte sequence:It also seems to ignore custom encodings, so
limit
represents the maximum number of bytes returned, not the maximum number of bytes read:Ruby's behavior is to consume additional bytes until a character in the
IO
's encoding is fully consumed, whether theIO
uses UTF-8 or not:It does so for successive
#gets
calls too:Contrast this with Crystal:
I think we should follow Ruby's behavior here. Note that we cannot consume less than
limit
bytes here; not allIO
s are seekable, and we don't have something like#unget
, so we cannot in general discard bytes forming an incomplete sequence after they are read.