bmwill / diffy

Tools for finding and manipulating differences between files
Apache License 2.0
75 stars 22 forks source link

what's the reason for having two: `apply`/`apply_bytes`, `create_patch`/`create_patch_bytes` #33

Open correabuscar opened 3 months ago

correabuscar commented 3 months ago

Why not have only one apply and one create_patch that treats everything as bytes, whether or not it's UTF-8. It would avoid code duplication.

But I'm sure there's a good reason for having split them like this, but I can't really think of it, that's why I'm asking.

At first blush, it would seem that treating even UTF-8 as bytes wouldn't make any difference, would it?

"In UTF-8 encoding, the bytes for `\n` (newline, ASCII value 10) and `\r` (carriage return, ASCII value 13) are not used as parts of multi-byte sequences. They are represented as single-byte characters. Here's a brief overview: - `\n` (newline) is represented as the single byte `0x0A` in UTF-8. - `\r` (carriage return) is represented as the single byte `0x0D` in UTF-8. UTF-8 encoding is designed such that multi-byte sequences do not contain values in the range of ASCII control characters (0x00 to 0x1F), which includes `\n` and `\r`. This design ensures that these control characters remain distinct and do not appear as part of any other character's multi-byte sequence. Therefore, in UTF-8, `\n` and `\r` are always interpreted as their respective ASCII control characters and not as part of any other characters. " - chatgpt-4o

Thank you for your time and consideration.