Leonidas-from-XIV / node-xml2js

XML to JavaScript object converter.
MIT License
4.87k stars 601 forks source link

Error: Non-whitespace before first tag. Line: 0 Column: 1 Char:  #595

Open ghost opened 3 years ago

ghost commented 3 years ago

This should work out of the box, since xml2js has now code to remove the byte order mark before parsing.

Originally posted by @Leonidas-from-XIV in https://github.com/Leonidas-from-XIV/node-xml2js/issues/345#issuecomment-264904146

But it doesn't seem that way for other characters. To reproduce the error:

curl -O https://habd.as/post/show-latest-posts-github-profile/assets/index.xml
npm i -g xml2js@0.4.23

Drop in the script provided in #345:

const fs = require('fs');
const util = require('util')
const parseString = require('xml2js').parseString;
const fileToParse = ('./index.xml')
const outputFilename=('output.json')

fs.readFile(fileToParse, function (err, data) {
    parseString(data, function (err, result) {
        console.dir(result);
        if (err) {
            console.log(err)
        }
       // do something with the JS object called result, or see below to save as JSON
        fs.writeFile('output.json', JSON.stringify(result), (writeErr) => {
            if (writeErr) throw writeErr;
            console.log('The file has been saved to ' + outputFilename);
        });
    });
});

And you'll get the output:


undefined
Error: Non-whitespace before first tag.
Line: 0
Column: 1
Char: 
    at error (/Users/jos/Developer/node_modules/sax/lib/sax.js:651:10)
    at strictFail (/Users/jos/Developer/node_modules/sax/lib/sax.js:677:7)
    at beginWhiteSpace (/Users/jos/Developer/node_modules/sax/lib/sax.js:951:7)
    at SAXParser.write (/Users/jos/Developer/node_modules/sax/lib/sax.js:1006:11)
    at Parser.exports.Parser.Parser.parseString (/Users/jos/Developer/node_modules/xml2js/lib/parser.js:323
:31)
    at Parser.parseString (/Users/jos/Developer/node_modules/xml2js/lib/parser.js:5:59)
    at exports.parseString (/Users/jos/Developer/node_modules/xml2js/lib/parser.js:369:19)
    at /Users/jos/Developer/balibebas-gh-repo/parse.js:8:5
    at FSReqCallback.readFileAfterClose [as oncomplete] (internal/fs/read_file_context.js:61:3)
The file has been saved to output.json

Notice the XML is parsed as expected when opened in Firefox and Chrome: https://habd.as/post/show-latest-posts-github-profile/assets/index.xml

I tried replacing the first character (0x1F) from the file using a hex editor and running the parsing script (above) again but the issue persists. I suppose this may be an issue in the SAX parser—but I'm unable to identify the root cause.

Shelrothman commented 3 years ago

I don't think this is the most elegant way to go about it. But I was able to get rid of theerror by parsing the string of xml.

let data = JSON.parse(JSON.stringify(myRequest.responseText))
 data = parseString(data, (err, result) => {
      console.dir(result);
      if (err) {
        console.log(err);
      }
    })