Open Jmuccigr opened 8 years ago
@Jmuccigr UTF-8 is the canonical encoding for Markdown. While IMHO this should be resolved, please also tell the app developer to fix their app. :)
Well, the other app is for generic text files, so I'll cut them some slack. :-) That said, I agree with you. UTF-8 is just a better choice.
For MacDown, really I don't mind if it converts everything to UTF-8 w/o BOM, but I think it should be able to open UTF-16 files and save them correctly, if it's not converting them.
Any thoughts on this, @uranusjr?
There are a lot of internals assuming UTF-8, and supporting UTF-16 for all of them could take a lot of work. But it is not impossible, and a good editor should support multiple encodings anyway.
The main problem (functionality-wise) would be to pick an encoding to use when a file is opened. Objective-C lacks a native encoding-detection API prior to 10.10[1], and baking one on our own would probably be too much work. Would it be OK if we just always assume UTF-8 when opening a file, and offer a “Reload with Encoding” option after the file is opened?
Again, I don't mind if MacDown converts to UTF-8, so long as it opens the UTF-16 file.
@uranusjr This is the encoding detection I have been using for years: https://github.com/JanX2/UniversalDetector It is excellent. The only thing it has problems with is detecting Mac Roman. Luckily, I have never seen a Markdown file using that encoding. ;)
@JanX2 Wow that’s cool. I’ve always known about UniversalDetector and its C++ ports, but wasn’t aware of such a ready-made framework. This will make things easy. I’ll probably put together something in the coming week.
Working with a new iPad and an app that defaults to saving files as UTF-16. MacDown wouldn't open one of these files, so I played around a little with the various UTF formats using BBEdit to create and save them. Here's what I found out:
MacDown can open (as reported by the bash
file
command):Can't open:
In other words, it looks like the presence of a BOM prevents MacDown from opening a UTF-16 file.
This seems like a bug.
PS MacDown seems to strip the BOM when saving UTF-8 files that have them. It also seems to mess up UTF-16 files when saving them, at least when looked at with BBEdit or pico.