Open nozwock opened 5 months ago
Hi @nozwock, I believe the codebase is well set up to attempt decoding of non-utf8 encodings: https://github.com/dandavison/delta/blob/e208f4ed52759fc018ee14808717c977df57b56f/src/delta.rs#L186
but it doesn't seem to get asked for often and no-one has actually added fallback to attempting other encodings.
As you can see, it attempts to decode as utf-8 and, if that fails, uses Rust's from_utf8_lossy
function. I can imagine the results aren't terribly helpful for utf-16.
What do you think about incorporating an encoding option into the CLI, and then decoding based on that?
Is this intended? Converting UTF16 files to UTF8 to do a diff would be a little cumbersome...