alek-sys / sublimetext_indentxml

Plugin for Sublime Text editor for reindenting XML and JSON files
MIT License
534 stars 135 forks source link

XML has utf-16 fails to parse #41

Open nportelli opened 11 years ago

nportelli commented 11 years ago

I'm a bit unfamiliar with utf, but is there a reason why it won't parse if it is utf-16?

TheChrisPratt commented 11 years ago

Are you sure the source file is encoded as UTF-16? It definitely won't parse a single-byte encoded file using a double-byte decoder. (Chris)

On Tue, Jul 2, 2013 at 1:49 PM, Nick Portelli notifications@github.comwrote:

I'm a bit unfamiliar with utf, but is there a reason why it won't parse if it is utf-16?

— Reply to this email directly or view it on GitHubhttps://github.com/alek-sys/sublimetext_indentxml/issues/41 .

nportelli commented 11 years ago

I'm not sure. I think it is whatever the default .net serializer we are using does. All I need to do is change 16 to 8 and your plugin works great. So in all reality not the plugin's issue. I should figure out how to make the thing save in utf-8. Go ahead and close this.

domduke12 commented 10 years ago

I had the same issue, I temporary change remove the utf-16 from xml declaration and add it back before returning formatted string. Not sure this can be a fixer. Test on sublime 3. Find "fix:" in followed code...

class IndentXmlCommand(BaseIndentCommand):
    def indent(self, s):                
        # convert to utf
        s = s.encode("utf-8") 
        xmlheader = re.compile(b"<\?.*\?>").match(s)
        # fix: replace header 
        if xmlheader:
            s = s.replace(xmlheader.group(), '<?xml version="1.0"?>')
        # convert to plain string without indents and spaces
        s = re.compile(b'>\s+([^\s])', re.DOTALL).sub(b'>\g<1>', s)
        # replace tags to convince minidom process cdata as text
        s = s.replace(b'<![CDATA[', b'%CDATAESTART%').replace(b']]>', b'%CDATAEEND%') 
        try:
            s = parseString(s).toprettyxml()
        except Exception as e:
            sublime.active_window().run_command("show_panel", {"panel": "console", "toggle": True})
            raise e
        # remove line breaks
        s = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL).sub('>\g<1></', s)
        # restore cdata
        s = s.replace('%CDATAESTART%', '<![CDATA[').replace('%CDATAEEND%', ']]>')
        # remove xml header
        s = s.replace("<?xml version=\"1.0\" ?>", "").strip()
        if xmlheader: 
            s = xmlheader.group().decode("utf-8") + "\n" + s 
        return s