Closed bitigchi closed 3 years ago
So what platform are you on, which editor do you use and was it set to use CP866?
I'm on macOS, using TextEdit. I used the Russian (DOS) encoding option, however TextEdit does not explicitly show the encoding option by codename; there was also Cyrillic (DOS), but did not try it. Probably I should have used that one.
Any chance you could just open the file in TextEdit again, but using the Cyrillic option as I'd like to get to the bottom of this please?
Okay, if I save the file "as-is", git diff
shows garbled characters, but a re-open in TextEdit shows the correct characters, as typed before. Unfortunately, for some reason, it does not allow me to "Save As" in order to select a different encoding, the encoding list is greyed out now. I'll try different combinations later, this is all I can report right now.
if I save the file "as-is", git diff shows garbled characters, but a re-open in TextEdit shows the correct characters,
That suggests to me that the editor wrote something other than CP866, and upon reopening it detected its encoding automatically. Does a Mac have the file
command. On linux with a clean checkout I see this
freecom.git/strings$ file russian.err
russian.err: Non-ISO extended-ASCII text
freecom.git/strings$ file DEFAULT.err
DEFAULT.err: ASCII text
After you have written the garbled chars that git diff shows, I expect that you'll see something other than ' Non-ISO extended-ASCII text', probably UTF-8 I'd guess. Did you try checking it out afresh and opening in TextEdit's Cyrillic Russian mode?
freecom.git/strings$ file russian.err russian.err: Non-ISO extended-ASCII text
This is the same output I see even after my fix commit.
After you have written the garbled chars that git diff shows, I expect that you'll see something other than ' Non-ISO extended-ASCII text', probably UTF-8 I'd guess. Did you try checking it out afresh and opening in TextEdit's Cyrillic Russian mode?
TextEdit does not have different modes to open files per se (if I understand correctly), it just opens and saves in the current encoding that the file is subject to.
This is the same output I see even after my fix commit.
So it's not UTF-8 or any other unicode, and so it's some 8-bit text that uses chars > 127. I.e. text in some code page other than 437.
TextEdit does not have different modes to open files per se (if I understand correctly), it just opens and saves in the current encoding that the file is subject to.
Since it's not really possible to automatically detect what code page has been used, other than heuristics like iterating through through some code pages until a garbage word makes sense, given what you just said, it must be how the file is being opened in TextEdit. I think the key is that with a fresh checkout, TextEdit has to open it and you must see non garbled text. I don't know Mac myself but this doc suggests there's a method https://support.apple.com/en-gb/guide/textedit/txted1028/1.16/mac/11.0
Interesting. I selected "Russian (DOS)" whilst opening, and it displayed correctly. So, everything is alright? Please let me know if you need more trials from me.
I think just to be sure, after doing that you should save the file without making any changes, then do git diff
. If you get no diff shown, then the method is good.
I tested saving with both no changes and some text. Both output shown as expected, all good.
Excellent, a mystery solved!
@PerditionC just in case it got lost in all the chatter here, there is a little fix to be applied in this PR.
My first try did not work (editing normally) so I just copied the wrong character from one of the other garbled characters. Cheated, but hey, it works. :)