danmactough / node-feedparser

Robust RSS, Atom, and RDF feed parsing in Node.js
Other
1.97k stars 190 forks source link

HTML entities stripped from item title #243

Open autonome opened 6 years ago

autonome commented 6 years ago

Eg: ">"

I'm parsing https://groups.google.com/forum/feed/mozilla.dev.platform/topics/rss.xml?num=50

The item titled "Intent to unship: as in image maps" has those entities encoded as < and > respectively.

However, in the code example below, the entities are missing from item.title, as is detectable from the length:

let url = 'https://groups.google.com/forum/feed/mozilla.dev.platform/topics/rss.xml?num=50'; let req = request(url); let feedparser = new FeedParser();

req.on('response', function (res) { this.pipe(feedparser); });

feedparser.on('readable', function() { let item = this.read(); console.log(item.title.length) }

danmactough commented 6 years ago

@autonome Thanks for opening this issue.

This stripping is being done intentionally -- however, I can't actually remember why. 😬 Presumably, the idea was to avoid handing people a XSS injection foot-gun.

Note that the un-stripped title is available: item['rss:title']['#'].

{ title: 'Intent to unship:  as  in image maps',
  description: 'Hi, In bug 1317937 I intend to unship the feature of <a> elements acting the same way as <area> elements in image maps. This functionality was specced in HTML 4, but no other browser implemented it and was removed from HTML 5. Timothy (:tnikkel) tried to do it before, but it got blocked on',
  summary: 'Hi, In bug 1317937 I intend to unship the feature of <a> elements acting the same way as <area> elements in image maps. This functionality was specced in HTML 4, but no other browser implemented it and was removed from HTML 5. Timothy (:tnikkel) tried to do it before, but it got blocked on',
  date: 2017-11-08T23:50:27.000Z,
  pubdate: 2017-11-08T23:50:27.000Z,
  pubDate: 2017-11-08T23:50:27.000Z,
  link: 'https://groups.google.com/d/msg/mozilla.dev.platform/JUB5K-sz6ek/F4hQWdDRBQAJ',
  guid: 'https://groups.google.com/d/topic/mozilla.dev.platform/JUB5K-sz6ek',
  author: 'Emilio Cobos Álvarez',
  comments: null,
  origlink: null,
  image: {},
  source: {},
  categories: [],
  enclosures: [],
  'rss:@': {},
  'rss:title':
   { '@': {},
     '#': 'Intent to unship: <a> as <area> in image maps' },
  'rss:link':
   { '@': {},
     '#': 'https://groups.google.com/d/msg/mozilla.dev.platform/JUB5K-sz6ek/F4hQWdDRBQAJ' },
  'rss:description':
autonome commented 6 years ago

Thanks for the quick reply Dan! Great to know there's a workaround :D

Totally understand the footgun concern. Difficult trade-off against correctness though. I'm not sure the trade-off is worth it, as the content is already escaped.

IIUC, the XSS risk would not occur by printing this content directly to a screen. The risk would occur only if someone first unescapes the content and then prints it.

On Thu, Nov 23, 2017 at 9:25 AM Dan MacTough notifications@github.com wrote:

@autonome https://github.com/autonome Thanks for opening this issue.

This stripping is being done intentionally https://github.com/danmactough/node-feedparser/commit/de3566b5104953d55f759eb58715bcc3d3857bc2 -- however, I can't actually remember why. 😬 Presumably, the idea was to avoid handing people a XSS injection foot-gun.

Note that the un-stripped title is available: item['rss:title']['#'].

{ title: 'Intent to unship: as in image maps', description: 'Hi, In bug 1317937 I intend to unship the feature of elements acting the same way as elements in image maps. This functionality was specced in HTML 4, but no other browser implemented it and was removed from HTML 5. Timothy (:tnikkel) tried to do it before, but it got blocked on', summary: 'Hi, In bug 1317937 I intend to unship the feature of elements acting the same way as elements in image maps. This functionality was specced in HTML 4, but no other browser implemented it and was removed from HTML 5. Timothy (:tnikkel) tried to do it before, but it got blocked on', date: 2017-11-08T23:50:27.000Z, pubdate: 2017-11-08T23:50:27.000Z, pubDate: 2017-11-08T23:50:27.000Z, link: 'https://groups.google.com/d/msg/mozilla.dev.platform/JUB5K-sz6ek/F4hQWdDRBQAJ', guid: 'https://groups.google.com/d/topic/mozilla.dev.platform/JUB5K-sz6ek', author: 'Emilio Cobos Álvarez', comments: null, origlink: null, image: {}, source: {}, categories: [], enclosures: [], 'rss:@': {}, 'rss:title': { '@': {}, '#': 'Intent to unship: as in image maps' }, 'rss:link': { '@': {}, '#': 'https://groups.google.com/d/msg/mozilla.dev.platform/JUB5K-sz6ek/F4hQWdDRBQAJ' }, 'rss:description':

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/danmactough/node-feedparser/issues/243#issuecomment-346519641, or mute the thread https://github.com/notifications/unsubscribe-auth/AADDt5olTZ8ZVmD1_wMiya2EhUJYpAmtks5s5NebgaJpZM4QnMJP .

danmactough commented 6 years ago

related to #165

autonome commented 6 years ago

Wonderful, thanks @danmactough!