danmactough / node-feedparser

Robust RSS, Atom, and RDF feed parsing in Node.js
Other
1.97k stars 192 forks source link

an angled bracket in title #165

Open piptan opened 8 years ago

piptan commented 8 years ago

Hi,

If I put the following feed into the library -

`<?xml version=\"1.0\" encoding=\"UTF-8\" ?> <rss version=\"2.0\">

W3Schools Home Page http://www.w3schools.com Free web building tutorials RSS <<<Tutorial>>> http://www.w3schools.com/xml/xml_rss.asp New RSS tutorial on W3Schools

`

The parsed output is -

{
  title: 'RSS >>',
  description: 'New RSS tutorial on W3Schools',
  summary: 'New RSS tutorial on W3Schools',
  date: null,
  pubdate: null,
  pubDate: null,
  link: 'http://www.w3schools.com/xml/xml_rss.asp',
  guid: 'http://www.w3schools.com/xml/xml_rss.asp',
  author: null,
  comments: null,
  origlink: null,
  image: {},
  source: {},
  categories: [],
  enclosures: [],
  'rss:@': {},
  'rss:title': { '@': {}, '#': 'RSS <<<Tutorial>>>' },
  'rss:link': { '@': {}, '#': 'http://www.w3schools.com/xml/xml_rss.asp' },
  'rss:description': { '@': {}, '#': 'New RSS tutorial on W3Schools' },
}

Please note how title contains the incorrect text, but rss:title has the right content.

theasteve commented 5 years ago

@danmactough is there a option to pass when calling feedparser to remove '{ '@': {}, '#': value} and just get the value? So instead of 'rss:link': { '@': {}, '#': 'http://www.w3schools.com/xml/xml_rss.asp' } to get 'rss:link: 'http://www.w3schools.com/xml/xml_rss.asp'?

danmactough commented 5 years ago

@theasteve 'rss:link' is a "raw" element, meaning it isn't normalized and retains all the information in the original XML. As a result, we need to retain both the attributes (the @) and the text node (the #).

But generally, the item's link property will have the value you want.