danmactough / node-feedparser

Robust RSS, Atom, and RDF feed parsing in Node.js
Other
1.97k stars 192 forks source link

Bad redirection (301) problem #287

Closed yPhil-gh closed 3 years ago

yPhil-gh commented 3 years ago

Hi, I'm having a problem with a feed URL that is not redirecting properly

URL of the feed: https://www.h24info.ma/feed FeedParser version: 2.2.10 Node version: 14.15.3 NPM version: 7.6.0

There is clearly a problem with the URL itself, as it makes wget loop forever:

# wget https://www.h24info.ma/feed
--2021-03-20 12:19:18--  https://www.h24info.ma/feed
Resolving www.h24info.ma (www.h24info.ma)... 104.21.39.120, 172.67.145.67, 2606:4700:3031::ac43:9143, ...
Connecting to www.h24info.ma (www.h24info.ma)|104.21.39.120|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.h24info.ma/feed/ [following]
--2021-03-20 12:19:18--  https://www.h24info.ma/feed/
Reusing existing connection to www.h24info.ma:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/rss+xml]
Saving to: ‘feed.2’

feed.2                               [ <=>                                                    ]  12.09K  --.-KB/s    in 0.02s   

2021-03-20 12:19:24 (518 KB/s) - Read error at byte 12383 (Success).Retrying.

--2021-03-20 12:19:25--  (try: 2)  https://www.h24info.ma/feed/
Connecting to www.h24info.ma (www.h24info.ma)|104.21.39.120|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/rss+xml]
Saving to: ‘feed.2’

feed.2                               [ <=>                                                    ]  12.09K  --.-KB/s    in 0.02s   

2021-03-20 12:19:31 (501 KB/s) - Read error at byte 12383 (Success).Retrying.

--2021-03-20 12:19:33--  (try: 3)  https://www.h24info.ma/feed/
Connecting to www.h24info.ma (www.h24info.ma)|104.21.39.120|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/rss+xml]
Saving to: ‘feed.2’
(...) repeated infinitely

Here is the (entire) curl response:

# curl -I https://www.h24info.ma/feed
HTTP/2 301 
date: Sat, 20 Mar 2021 11:20:08 GMT
content-type: application/rss+xml; charset=UTF-8
set-cookie: __cfduid=ddff178f84ae3564a19a506e6770817e81616239208; expires=Mon, 19-Apr-21 11:20:08 GMT; path=/; domain=.h24info.ma; HttpOnly; SameSite=Lax
vary: Accept-Encoding,Cookie,User-Agent
x-redirect-by: WordPress
last-modified: Sat, 20 Mar 2021 10:48:51 GMT
location: https://www.h24info.ma/feed/
cache-control: max-age=2592000
expires: Mon, 19 Apr 2021 11:20:08 GMT
cf-cache-status: DYNAMIC
cf-request-id: 08f0f63ea90000ff644a9e1000000001
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report?s=ff1ZDtKFUu4zqFEwajoLi4qyf3zhwxqMolxSZGzB1PKoGRPveffiANXBvbAaRi2gaO4aucB0RYP31Yen78VCCe4FeAmvVuuRiPwbwLxHNQ%3D%3D"}],"max_age":604800,"group":"cf-nel"}
nel: {"max_age":604800,"report_to":"cf-nel"}
server: cloudflare
cf-ray: 632e8caaafa5ff64-MAD
alt-svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400

My code:

function getFeed (feedUrl, callback) {
  // Get a response stream
  fetch(feedUrl, {
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml'
  }).then(function (res) {

    console.error('node-fetch status: %s', res.status);

    // Setup feedparser stream
    var feedparser = new FeedParser();
    var feedItems = [];
    feedparser.on('error', function() {
      console.error('## RRfeedUrl: %s (%s)', feedUrl, res.status);
      return callback('Unknown error');
    });
    feedparser.on('end', done);
    feedparser.on('readable', function() {
      try {
        var item = this.read();
        if (item !== null) feedItems.push (item);
      }
      catch (err) {
        console.error('## ERR (%s)', err.message);
      }
    }).on ('end', function () {
      var meta = this.meta;
      return callback (null, feedItems, meta.title, meta.link);
    });

    if (res.status != 200) {
      return callback(res.status);
    }

    var charset = getParams(res.headers.get('content-type') || '').charset;
    var responseStream = res.body;
    responseStream = maybeTranslate(responseStream, charset);
    responseStream.pipe(feedparser);

  }).catch((err) => {
    console.error('## ERR (%s)', err.message);
    return callback(err);
  });
}

Finer inspection reveals a pending promise ; How can I resolve it and move on?

I tried using redirect: 'manual' in the fetch call, but it makes a lot of otherwise fine feed URLs fail ; I'm really not sure if I have to track and handle those redirect problems at the fetch level, or at the feedParser one, since it follows redirects... Is there a way to detect that the redirect fails?

danmactough commented 3 years ago

Sorry, but this is not an issue with Feedparser. At first glance, it just looks like the remote server is incorrectly configured. You'll have to decide how you want to manage that.