WinMerge / winmerge

WinMerge is an Open Source differencing and merging tool for Windows. WinMerge can compare both folders and files, presenting differences in a visual text format that is easy to understand and handle.
https://winmerge.org/
GNU General Public License v2.0
6.05k stars 773 forks source link

An XML file shows as ISO-10646-UC-2 encoding by default? #2342

Closed Thorium closed 1 month ago

Thorium commented 1 month ago

I have a single XML file that fails to load in Winmerge but does work in other editors. For some reason Winmerge detects the encoding as ISO-10646-UC-2 and I see all the letters as Chinese Kanji characters. If I rename this file to .TXT then it works well.

So just wondering if this is a Winmerge bug or if there is something wrong with the file. Not an urgent issue, other XML files work well.

I cannot attach an XML file directly to GitHub issue, so I have to put it to a zip-file to attach here. But if you extract the file from zip and then try to open it with Winmerge, you should be able to replicate the issue.

test.zip

sdottaka commented 1 month ago

This is because, even though the encoding in the XML file is specified as UTF-16 (UCS-2), the actual encoding is ascii or UTF-8. The file is not in a valid format as an XML file, but to display it correctly in WinMerge, uncheck the "Detect codepage info for these type of files: .html, .rc, xml" checkbox in the Codepage category of the Options window as shown below.

image

Thorium commented 1 month ago

Thanks!

Changing <?xml version="1.0" encoding="utf-16"?> to <?xml version="1.0" encoding="utf-8"?> and re-saving the file did help.