Open nportelli opened 11 years ago
Are you sure the source file is encoded as UTF-16? It definitely won't parse a single-byte encoded file using a double-byte decoder. (Chris)
On Tue, Jul 2, 2013 at 1:49 PM, Nick Portelli notifications@github.comwrote:
I'm a bit unfamiliar with utf, but is there a reason why it won't parse if it is utf-16?
— Reply to this email directly or view it on GitHubhttps://github.com/alek-sys/sublimetext_indentxml/issues/41 .
I'm not sure. I think it is whatever the default .net serializer we are using does. All I need to do is change 16 to 8 and your plugin works great. So in all reality not the plugin's issue. I should figure out how to make the thing save in utf-8. Go ahead and close this.
I had the same issue, I temporary change remove the utf-16 from xml declaration and add it back before returning formatted string. Not sure this can be a fixer. Test on sublime 3. Find "fix:" in followed code...
class IndentXmlCommand(BaseIndentCommand):
def indent(self, s):
# convert to utf
s = s.encode("utf-8")
xmlheader = re.compile(b"<\?.*\?>").match(s)
# fix: replace header
if xmlheader:
s = s.replace(xmlheader.group(), '<?xml version="1.0"?>')
# convert to plain string without indents and spaces
s = re.compile(b'>\s+([^\s])', re.DOTALL).sub(b'>\g<1>', s)
# replace tags to convince minidom process cdata as text
s = s.replace(b'<![CDATA[', b'%CDATAESTART%').replace(b']]>', b'%CDATAEEND%')
try:
s = parseString(s).toprettyxml()
except Exception as e:
sublime.active_window().run_command("show_panel", {"panel": "console", "toggle": True})
raise e
# remove line breaks
s = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL).sub('>\g<1></', s)
# restore cdata
s = s.replace('%CDATAESTART%', '<![CDATA[').replace('%CDATAEEND%', ']]>')
# remove xml header
s = s.replace("<?xml version=\"1.0\" ?>", "").strip()
if xmlheader:
s = xmlheader.group().decode("utf-8") + "\n" + s
return s
I'm a bit unfamiliar with utf, but is there a reason why it won't parse if it is utf-16?