Closed neon-sunset closed 8 months ago
This is not used on the hot path. And it will not work on Windows. See the comment in the Readme.
Java code produces \r\n
so there is an implicit assumption about OS-native line endings.
The hot path just treats all chars <=13
as line endings...
Current implementation matches either CR
or LF
and if CR
is followed by LF
it effectively advances by 1. The CR
matching step is doing extra work the caller does not seem to care about - scanning for just LF
will match on both CRLF
s and LF
s, and I don't see the loop accounting for the "holes" between segments (if it does - let me know!).
Either way, if this isn't on a hot path then it likely doesn't matter ¯_(ツ)_/¯
Either way, if this isn't on a hot path then it likely doesn't matter
Yeah, it did matter and I rewrote that part. I will (very likely) rewrite chunking logic and remove that.
But you are right that for boundaries we only need \n
.
Got it, thank you for your work. Hopefully it will bring more attention to our underappreciated language😄
The newline scanner seems to include line endings and its code looks very similar to how CoreLib's line iterator is implemented. Because the only line endings that are encountered are either
CRLF
orLF
, there is no need to scan for justCR
s with.IndexOfAny
which are not valid line endings on Windows or Unix systems.This is a very quick change and I have not looked into other code in this solution, just something that stood out to me because I worked on a similar problem some time ago here: https://github.com/U8String/U8String/blob/main/Sources/Primitives/U8Enumerators.cs#L505-L536