danmactough / node-feedparser

Robust RSS, Atom, and RDF feed parsing in Node.js
Other
1.97k stars 190 forks source link

Not parsing a valid feed #209

Closed rodrigopavezi closed 7 years ago

rodrigopavezi commented 7 years ago

I have tried to use feed parser to parse this rss feed https://www.wired.com/feed but with no success. It is throwing an Not a feed as the following

 at FeedParser.handleEnd (/Users/rodrigopavezi/Workspace/IdeaProjects/risevision/feed-parser/node_modules/feedparser/lib/feedparser/index.js:120:13)
    at emitNone (events.js:86:13)
    at SAXStream.emit (events.js:185:7)
    at SAXParser.SAXStream._parser.onend (/Users/rodrigopavezi/Workspace/IdeaProjects/risevision/feed-parser/node_modules/sax/lib/sax.js:190:10)
    at emit (/Users/rodrigopavezi/Workspace/IdeaProjects/risevision/feed-parser/node_modules/sax/lib/sax.js:640:35)
    at end (/Users/rodrigopavezi/Workspace/IdeaProjects/risevision/feed-parser/node_modules/sax/lib/sax.js:683:5)
    at SAXParser.end (/Users/rodrigopavezi/Workspace/IdeaProjects/risevision/feed-parser/node_modules/sax/lib/sax.js:154:24)
    at SAXStream.end (/Users/rodrigopavezi/Workspace/IdeaProjects/risevision/feed-parser/node_modules/sax/lib/sax.js:248:18)
    at FeedParser._flush (/Users/rodrigopavezi/Workspace/IdeaProjects/risevision/feed-parser/node_modules/feedparser/lib/feedparser/index.js:1087:17)
    at FeedParser.<anonymous> (/Users/rodrigopavezi/Workspace/IdeaProjects/risevision/feed-parser/node_modules/feedparser/node_modules/readable-stream/lib/_stream_transform.js:115:49) Error: Not a feed
    at FeedParser.handleEnd (/Users/rodrigopavezi/Workspace/IdeaProjects/risevision/feed-parser/node_modules/feedparser/lib/feedparser/index.js:120:13)
    at emitNone (events.js:86:13)
    at SAXStream.emit (events.js:185:7)
    at SAXParser.SAXStream._parser.onend (/Users/rodrigopavezi/Workspace/IdeaProjects/risevision/feed-parser/node_modules/sax/lib/sax.js:190:10)
    at emit (/Users/rodrigopavezi/Workspace/IdeaProjects/risevision/feed-parser/node_modules/sax/lib/sax.js:640:35)
    at end (/Users/rodrigopavezi/Workspace/IdeaProjects/risevision/feed-parser/node_modules/sax/lib/sax.js:683:5)
    at SAXParser.end (/Users/rodrigopavezi/Workspace/IdeaProjects/risevision/feed-parser/node_modules/sax/lib/sax.js:154:24)
    at SAXStream.end (/Users/rodrigopavezi/Workspace/IdeaProjects/risevision/feed-parser/node_modules/sax/lib/sax.js:248:18)
    at FeedParser._flush (/Users/rodrigopavezi/Workspace/IdeaProjects/risevision/feed-parser/node_modules/feedparser/lib/feedparser/index.js:1087:17)
    at FeedParser.<anonymous> (/Users/rodrigopavezi/Workspace/IdeaProjects/risevision/feed-parser/node_modules/feedparser/node_modules/readable-stream/lib/_stream_transform.js:115:49)

Would you know what can be happening as the rss feed is valid based on the W3C validator?

https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fwww.wired.com%2Ffeed

Cheers

danmactough commented 7 years ago

@rodrigopavezi Usually, this is because the feed being compressed or gzipped and you haven't ungzipped it. (See Content-Encoding: gzip in the response headers.)

Please follow the compressed example for a starting point.

⇰ curl -i https://www.wired.com/feed
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Access-Control-Allow-Origin: *
Cache-Control: stale-while-revalidate=86400, stale-while-error=86400
Content-Encoding: gzip
Content-Type: text/xml; charset=UTF-8
ETag: "16f2a023597d9eb49b8e0b5fea3a46ae-gzip"
Last-Modified: Wed, 05 Apr 2017 00:45:34 GMT
Link: <https://www.wired.com/wp-content/themes/Phoenix/assets/css/style.css?ver=1491339624>; rel=preload; as=stylesheet
Link: <https://www.wired.com/wp-json/>; rel="https://api.w.org/"
Server: Apache
Via: 1.1 varnish
Fastly-Debug-State: HIT
Fastly-Debug-Digest: e65399f50403665a37b112fe1f0ee20933475e8f0a1fad894decd9759d0622fb
Content-Length: 2909
Accept-Ranges: bytes
Date: Wed, 05 Apr 2017 02:09:17 GMT
Via: 1.1 varnish
Age: 3517
Connection: keep-alive
X-Served-By: cache-jfk8150-JFK, cache-iad2141-IAD
X-Cache: HIT, HIT
X-Cache-Hits: 37, 65
X-Timer: S1491358158.704102,VS0,VE0
Content-Security-Policy: default-src https: data: 'unsafe-inline' 'unsafe-eval'; child-src https: data: blob:; connect-src https: data: blob:; font-src https: data:; img-src https: data:; media-src blob: https:; object-src https:; script-src https: data: blob: 'unsafe-inline' 'unsafe-eval'; style-src https: 'unsafe-inline'; block-all-mixed-content; upgrade-insecure-requests; report-uri https://capture.condenastdigital.com/csp/wired
Strict-Transport-Security: max-age=31536000; preload
...