alek-sys / sublimetext_indentxml

Plugin for Sublime Text editor for reindenting XML and JSON files
MIT License
534 stars 135 forks source link

Reformatting breaks encoding #45

Closed SlyNet closed 8 years ago

SlyNet commented 11 years ago

If I reformat following XML:

<?xml version="1.0" encoding="ISO-8859-1"?>
<Tag><Tag1>Småhus</Tag1></Tag>

It produces

<?xml version="1.0" encoding="ISO-8859-1"?>
<Tag>
     <Tag1>Småhus</Tag1>
</Tag>
maxi4220 commented 8 years ago

I found the same issue. How could I help in fixing this?

stephanschielke commented 8 years ago

Same here. I have a file with <?xml version="1.0" encoding="ISO-8859-1"?> saved with ISO-8859-1 encoding.

Startet the "Indent XML" command which broke the encoding in the file.

stephanschielke commented 8 years ago

A quick but annoing workaround:

  1. Create a new temp file
  2. Save with encoding UTF-8
  3. Copy the ISO or whatever encoding XML content into it
  4. Change the header to encoding="UTF-8"
  5. Indent XML
  6. Change the header back to normal
  7. Copy the temp file contents to the original file
alek-sys commented 8 years ago

Hi guys, First of all, the reason of original issue is mismatch of encoding="ISO-8859-1" in XML header of the source XML and file itself. Note that despite your file is saved in UTF8 XML declares that it has ISO-8859-1 standard. And å symbol is NOT a part of this standard. Thus formatter converts å into string according to a ISO-8859-1 standard.

So solution is - you need to decide which encoding is your XML data (not file). If that is ISO-8859-1 - it should not contain non-latin symbols. If that is UTF - just remove XML header or put 'UTF-8' there and formatting will work ok.

So sorry but I don't think I'm planning to resolve this any time soon. If you have any ideas how to carefully combine XML file encoding and XML header encoding - feel free to submit pull requests.

Cheers, Alexey.