danmactough / node-feedparser

Robust RSS, Atom, and RDF feed parsing in Node.js
Other
1.97k stars 192 forks source link

Not resolving relative link elements, even when feedurl is used #171

Closed TitanChris closed 6 years ago

TitanChris commented 8 years ago

In this feed at ArcGames:

http://www.arcgames.com/en/games/neverwinter/news/rss

item links are relative URLs. Normally, passing feedurl to FeedParser causes relative URLs to be resolved, but I'm not seeing that here.

TitanChris commented 8 years ago

To reproduce the bug, here is a trimmed portion of the offending feed:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet title="XSL_formatting" type="text/xsl" href="/nolsol.xsl"?>
<rss xmlns:media="http://search.yahoo.com/mrss/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">  
  <channel> 
    <title><![CDATA[Neverwinter]]></title>
    <link>/en/games/neverwinter/news</link>  
    <description><![CDATA[]]></description>  
    <language>en_US</language>  
    <lastBuildDate>Mon, 16 May 16 12:39:07 -0700</lastBuildDate>  
    <copyright><![CDATA[
    Copyright © 2016. Perfect World Entertainment Inc, All Rights Reserved.    ]]></copyright>  
    <image> 
      <url>http://images-cdn.perfectworld.com/arc/ea/cd/eacd91ebea924d509f11c6b2fb28a12c1458057528.png</url>
      <title><![CDATA[Neverwinter]]></title>  
      <link>/en/games/neverwinter/news</link>  
    </image>  
    <ttl>60</ttl>
    <atom:link href="/en/games/neverwinter/news/rss" rel="self" type="application/rss+xml"/>
        <item> 
      <title><![CDATA[The Arc Weekly - May 13, 2016]]></title>  
      <description><![CDATA[
Check out highlights from some of the most popular and highly anticipated Arc games, including Neverwinter, Star Trek Online, Gigantic and Livelock.
[snip]
]]></description>  
      <link>/en/games/neverwinter/news/detail/9961173-patch-notes%3A-nw.60.20160410a.8</link>
      <guid isPermaLink="false">http://www.arcgames.com/en/games/PAB_nw/news/detail/9961173</guid>  
      <pubDate>Wed, 11 May 16 17:51:08 -0700</pubDate>
    </item>
    <item> 
      <title><![CDATA[The Siege of Neverwinter Returns!]]></title>  
      <description><![CDATA[
The Cult of the Dragon has amassed their forces once more to lay siege to the city of Neverwinter! Sharpen those blades and prepare to defend the city in the updated Siege of Neverwinter! For a list of changes since the last time the Cult of the Dragon attempted their invasion, head to the bottom of this blog.
[snip]
]]></description>  
      <link>/en/games/neverwinter/news/detail/9958273-the-siege-of-neverwinter-returns%21</link>
      <guid isPermaLink="false">http://www.arcgames.com/en/games/PAB_nw/news/detail/9958273</guid>  
      <pubDate>Wed, 11 May 16 10:27:00 -0700</pubDate>
    </item>
  </channel> 
</rss>

You can cat that snippet into the following script, to reproduce the item.link URLs not being resolved.

var FEED_URL = 'http://www.arcgames.com/en/games/neverwinter/news/rss';

var FeedParser = require('feedparser');

// Small helper class allowing an objectMode stream to be piped to stdout.
var Stream = require('stream');
var Util = require('util');

function StringifyStream() {
  Stream.Transform.call(this, {objectMode: true});
}
Util.inherits(StringifyStream, Stream.Transform);

StringifyStream.prototype._transform = function(data, err, doneCallback) {
  this.push(JSON.stringify(data));
  doneCallback();
};

var feedparser = new FeedParser({ feedurl: FEED_URL });

process.stdin
  .pipe(feedparser)
  .pipe(new StringifyStream()).pipe(process.stdout)
  ;