MacDownApp / macdown

Open source Markdown editor for macOS.
https://macdown.uranusjr.com/
9.45k stars 1.09k forks source link

Can't open UTF-16 files with BOM #523

Open Jmuccigr opened 8 years ago

Jmuccigr commented 8 years ago

Working with a new iPad and an app that defaults to saving files as UTF-16. MacDown wouldn't open one of these files, so I played around a little with the various UTF formats using BBEdit to create and save them. Here's what I found out:

MacDown can open (as reported by the bash file command):

Can't open:

In other words, it looks like the presence of a BOM prevents MacDown from opening a UTF-16 file.

This seems like a bug.

PS MacDown seems to strip the BOM when saving UTF-8 files that have them. It also seems to mess up UTF-16 files when saving them, at least when looked at with BBEdit or pico.

JanX2 commented 8 years ago

@Jmuccigr UTF-8 is the canonical encoding for Markdown. While IMHO this should be resolved, please also tell the app developer to fix their app. :)

Jmuccigr commented 8 years ago

Well, the other app is for generic text files, so I'll cut them some slack. :-) That said, I agree with you. UTF-8 is just a better choice.

For MacDown, really I don't mind if it converts everything to UTF-8 w/o BOM, but I think it should be able to open UTF-16 files and save them correctly, if it's not converting them.

Jmuccigr commented 8 years ago

Any thoughts on this, @uranusjr?

uranusjr commented 8 years ago

There are a lot of internals assuming UTF-8, and supporting UTF-16 for all of them could take a lot of work. But it is not impossible, and a good editor should support multiple encodings anyway.

The main problem (functionality-wise) would be to pick an encoding to use when a file is opened. Objective-C lacks a native encoding-detection API prior to 10.10[1], and baking one on our own would probably be too much work. Would it be OK if we just always assume UTF-8 when opening a file, and offer a “Reload with Encoding” option after the file is opened?

[1] https://developer.apple.com/library/mac/documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/index.html#//apple_ref/occ/clm/NSString/stringEncodingForData:encodingOptions:convertedString:usedLossyConversion:

Jmuccigr commented 8 years ago

Again, I don't mind if MacDown converts to UTF-8, so long as it opens the UTF-16 file.

JanX2 commented 8 years ago

@uranusjr This is the encoding detection I have been using for years: https://github.com/JanX2/UniversalDetector It is excellent. The only thing it has problems with is detecting Mac Roman. Luckily, I have never seen a Markdown file using that encoding. ;)

uranusjr commented 8 years ago

@JanX2 Wow that’s cool. I’ve always known about UniversalDetector and its C++ ports, but wasn’t aware of such a ready-made framework. This will make things easy. I’ll probably put together something in the coming week.