danmactough / node-feedparser

Robust RSS, Atom, and RDF feed parsing in Node.js
Other
1.97k stars 192 forks source link

Uncatchable error when parsing [slightly] invalid XML #181

Closed renefournier closed 7 years ago

renefournier commented 7 years ago

Hi Dan,

I'm using a package that wraps your node-feedparser for not streaming applications, and encountering a problem that seems to be a SAX error through inside your package, but which I can't catch... Anyway, it happens rarely, but here's one example:

const request       = require('request')
  ,   parser        = require('node-feedparser')
  ,   parseString   = require('xml2js').parseString
  ;

var url = 'http://mckinseyhightech.com/Digital/Digital.xml'

request (url, function(error, response, body) {
  try {
    parseString(body, function (err, result) {
      if (err) {
        console.error('Error', err);
      }
      console.dir(result);
      if (result.rss) {
        console.log ('good');
      } else {
        var rootKey;
        for(var prop in result) {
         console.log( prop ); //will give "services"
         rootKey = prop;
        }
        console.log (rootKey);
      }
    });

  } catch(error) {
    return console.error ('Not valid XML for', error);
  }

  var options = {
    siteTags: ['title', 'description', 'author', 'link', 'date', 'pubdate', 'language', 'image', 'categories', 'itunes:explicit'],
    itemTags: ['title', 'description', 'author', 'date', 'pubdate', 'guid', 'image', 'enclosures', 'itunes:explicit', 'itunes:duration', 'itunes:image', 'media:group']
  };

  parser (body, options, function(error, parsed) {
    if (error) {
      return console.error('!!! Feedparser error', error);
    } else {
      return (util.inspect(parsed, showHidden = false, depth = 40, colorize = true));
    }
  });

});
`

(I'm using xml2js to validate documents first, but still the above example slips through—you'll notice a trailing < in the XML.)

The terminal output:

`
{ rss: 
   { '$': 
      { 'xmlns:itunes': 'http://www.itunes.com/dtds/podcast-1.0.dtd',
        version: '2.0' },
     channel: [ [Object] ] } }
good
events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: Unexpected end
Line: 203
Column: 2
Char: 
    at error (/Users/me/feeds/node_modules/feedparser/node_modules/sax/lib/sax.js:642:8)
    at end (/Users/me/feeds/node_modules/feedparser/node_modules/sax/lib/sax.js:650:64)
    at Object.SAXParser.end (/Users/me/feeds/node_modules/feedparser/node_modules/sax/lib/sax.js:149:24)
    at SAXStream.end (/Users/me/feeds/node_modules/feedparser/node_modules/sax/lib/sax.js:234:16)
    at FeedParser._flush (/Users/me/feeds/node_modules/feedparser/main.js:1066:17)
    at FeedParser.<anonymous> (/Users/me/feeds/node_modules/feedparser/node_modules/readable-stream/lib/_stream_transform.js:135:12)
    at FeedParser.g (events.js:260:16)
    at emitNone (events.js:67:13)
    at FeedParser.emit (events.js:166:7)
    at finishMaybe (/Users/me/feeds/node_modules/feedparser/node_modules/readable-stream/lib/_stream_writable.js:371:12)

I've been using feedparser for hundreds of feeds, and it's working great, it's just a few edge cases that seem to cause a problem. Thanks for your hard work on this great package!

...René

danmactough commented 7 years ago

@renefournier can you check which version? That callback API has not been supported for quite a while.

renefournier commented 7 years ago

@danmactough Hi Dan, I'm using 1.1.5 of your package as a dependency of https://github.com/BiteBit/node-feedparser.git.

danmactough commented 7 years ago

@renefournier I'm sorry, but there's not much I can do.

I'm able to parse that feed just fine, without errors. If you notice a broken feed that causes an error again, save the feed to a file and include that (or a link to it -- maybe a gist) in any issue you you open.

I notice, however, that the code you posted about seems to have a bug. Where you are wrapping your call to xml2js in a try/catch, that won't work because the parsing is async (presumably -- I don't know the library).

Sorry I can't offer more help. But the unreproducible error and the fact that you're using a wrapper library make this issue impossible for me to debug.