Why not have only one apply and one create_patch that treats everything as bytes, whether or not it's UTF-8. It would avoid code duplication.
But I'm sure there's a good reason for having split them like this, but I can't really think of it, that's why I'm asking.
At first blush, it would seem that treating even UTF-8 as bytes wouldn't make any difference, would it?
"In UTF-8 encoding, the bytes for `\n` (newline, ASCII value 10) and `\r` (carriage return, ASCII value 13) are not used as parts of multi-byte sequences. They are represented as single-byte characters.
Here's a brief overview:
- `\n` (newline) is represented as the single byte `0x0A` in UTF-8.
- `\r` (carriage return) is represented as the single byte `0x0D` in UTF-8.
UTF-8 encoding is designed such that multi-byte sequences do not contain values in the range of ASCII control characters (0x00 to 0x1F), which includes `\n` and `\r`. This design ensures that these control characters remain distinct and do not appear as part of any other character's multi-byte sequence. Therefore, in UTF-8, `\n` and `\r` are always interpreted as their respective ASCII control characters and not as part of any other characters.
" - chatgpt-4o
Why not have only one
apply
and onecreate_patch
that treats everything as bytes, whether or not it's UTF-8. It would avoid code duplication.But I'm sure there's a good reason for having split them like this, but I can't really think of it, that's why I'm asking.
At first blush, it would seem that treating even UTF-8 as bytes wouldn't make any difference, would it?
Thank you for your time and consideration.