Leonidas-from-XIV / node-xml2js

XML to JavaScript object converter.
MIT License
4.88k stars 602 forks source link

Get XML encoding #511

Open kirill-zhirnov opened 5 years ago

kirill-zhirnov commented 5 years ago

Hello! First for all - thank you for your efforts!

I have XML documents with different encodings. I need to get "encoding" parameter from XML definitions:

<?xml version="1.0" encoding="windows-1251"?>

I've found it pretty tricky:

parser = new xml2js.Parser({
    trim: true
    explicitArray: false
});
parser.saxParser.onprocessinginstruction = (val) ->
    console.log 'extract encogind from string:', val

Is there a simpler way?)

knoxcard commented 5 years ago

here it is...

var parser = new require('xml2js').Parser({
  trim: true,
  explicitArray: false
})
console.log(parser.options.xmldec.version)  ---> 1.0
console.log(parser.options.xmldec.encoding)  ---> UTF-8

Full output of parser.options.xmldec...

console.log(JSON.stringify(parser.options.xmldec, null, 4))
{
  "version": "1.0",
  "encoding": "UTF-8",
  "standalone": true
}

@kirill-zhirnov - close issue?

kirill-zhirnov commented 5 years ago

thx, yes :)

knoxcard commented 5 years ago

@kirill-zhirnov - you have to close the issue by clicking on the "close issue" button at the bottom on your side. Thanks!

Hamper commented 4 years ago

console.log(JSON.stringify(parser.options.xmldec, null, 4))

This returns parser options but not encoding attribute from xml tag.

Omega-Ariston commented 4 years ago

Couldn't find a way to extract this attribute directly as well...but found two ways to extract it from characters:

  1. If you want to know the value of 'encoding' before parsing, you can extract it from xml directly.
    
    const data = `
    <?xml version="1.0" encoding="windows-1251"?>
    <root>
    <username>test</username>
    </root>
    `;
let encoding = 'utf-8';

//example1    extract encogind from xml
const reg = /\<\?xml.+encoding=\".+?\"/g;
let result = null
do {
    result = reg.exec(data);
    if (result) {
        const start = result[0].indexOf("encoding=\"") + 10;
        const end = result[0].indexOf("\"", start);
        encoding = result[0].substring(start, end);
        console.log(encoding); //-> windows-1251
        break;
    }
} while (result);

> 2. You can also parse the string during 'onprocessinginstruction' event as you mentioned.

```javascript
    //example2    extract encogind from string
    parser.saxParser.onprocessinginstruction = function (node) {
        if (node.name == 'xml') {
            const start = node.body.indexOf("encoding=\"") + 10;
            const end = node.body.indexOf("\"", start);
            encoding = node.body.substring(start, end);
            console.log(encoding); //-> windows-1251
        }
    };