danmactough / node-feedparser

Robust RSS, Atom, and RDF feed parsing in Node.js
Other
1.97k stars 190 forks source link

What happens to the fields unrecognized by feedparser in normalized mode? #216

Closed iq-dot closed 6 years ago

iq-dot commented 7 years ago

As per the title, is it possible to get access to the fields that was not recognized by the feedparser?

For example I have a field 'rss:next_page' that when normalize is off, it is part of the meta at the top level. With normalize on I can't find it.

I like the normalized output but I need access to that field. Can someone tell me if it is still available and if so where?

The rss feed is like this:

<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:fh="http://purl.org/syndication/history/1.0">
    <channel>
        <title>LBC</title>
        <link>http://www.lbc.co.uk</link>
        <description>LBC Videos</description>
        <item>
        <item>
            <guid isPermaLink="false">g4cDQxYjE6qFqPb0VDWqYFbLxcp8-4R2</guid>
            <title>May's Dinner With Juncker: The Inside Story</title>
            <description>This is the alarming inside story of Theresa May's first dinner with EU President Jean-Claude Juncker.</description>
            <link>http://cf.c.ooyala.com/g4cDQxYjE6qFqPb0VDWqYFbLxcp8-4R2/DOcJ-FxaFrRg4gtDEwOjFsaTowODE7WX</link>
            <enclosure url="http://cf.c.ooyala.com/g4cDQxYjE6qFqPb0VDWqYFbLxcp8-4R2/DOcJ-FxaFrRg4gtDEwOjFsaTowODE7WX" length="3342" type="application/x-shockwave-flash" ></enclosure>
            <media:content url="http://cf.c.ooyala.com/g4cDQxYjE6qFqPb0VDWqYFbLxcp8-4R2/DOcJ-FxaFrRg4gtDEwOjFsaTowODE7WX" type="video/x--flv" 
expression="sample" duration="3342" bitrate="3192" lang="eng" ></media:content>
            <media:title type="plain">May's Dinner With Juncker: The Inside Story</media:title>
            <media:description type="html">This is the alarming inside story of Theresa May's first dinner with EU President Jean-Claude Juncker.</media:description>
            <media:thumbnail url="http://cf.c.ooyala.com/g4cDQxYjE6qFqPb0VDWqYFbLxcp8-4R2/promo317917984" width="1280" height="720" time="3342" ></media:thumbnail>
            <media:category label="Entertainment/Celebrity News">Entertainment/Celebrity News</media:category>
            <media:text>This is the alarming inside story of Theresa May's first dinner with EU President Jean-Claude Juncker.</media:text>
            <media:keywords>LBC,Video Feature,Interview,Over 3 mins,AOL One - LBC,Dailymotion - LBC,</media:keywords>
            <media:activation></media:activation>
            <media:expiration></media:expiration>
            <pubDate>Mon, 01 May 2017 14:32:50 GMT</pubDate>
        </item>
        <next_page>
            <![CDATA[http://api.ooyala.com/v2/syndications/23/feed?pcode=44&page_token=1]]>
        </next_page>
    </channel>
</rss>
danmactough commented 6 years ago

That feed is not valid.

The reason you may see 'rss:next_page' is because there are 2 open item elements, and the parser thinks next_page is part of the outer item.

next_page is a not a valid element of an RSS channel (or item, for that matter). XML (and thus RSS) can be extended with namespaces, but that feed hasn't done that.