danmactough / node-feedparser

Robust RSS, Atom, and RDF feed parsing in Node.js
Other
1.97k stars 192 forks source link

Can't access any image from feed. #101

Closed scottmcpherson closed 10 years ago

scottmcpherson commented 10 years ago

For some reason, I'm unable to access image's urls and titles. Everything else is loaded. When I console.log the item.image.url I get undefined. Same thing with item.image.title.

       .on('readable', function () {
            // do something else, then do the next thing
            var stream = this, item;
            while (item = stream.read()) {
                console.log( 'Got article: %s', item.title || item.description );
                console.log( 'article link: ', item.link );
                console.log( 'article date: ', item.pubdate );
                console.log( 'article image url: ' + item.image.url ); // this is logging undefined
            }
        });

Any ideas? I've tried several RSS feeds with no luck.

danmactough commented 10 years ago

@scott-mcpherson I think it's safe to say that most feeds won't have any image properties associated with the feed items. It's not a property defined by any feed spec; I've included it primarily to provide a convenient accessor to iTunes and Yahoo Media url properties: https://github.com/danmactough/node-feedparser/blob/master/main.js#L978-L990

Do you have a feed that you think should have an item.image? If so, please post the url so I can check it out.

scottmcpherson commented 10 years ago

@danmactough Thanks, I wasn't aware of that. I thought image properties would be included in RSS data. I'm wonder how an RSS feed reader like feedly gets the images from the feeds? Perhaps they scrape the feeds? Anyways here's one of the feeds I was trying to extract images from: http://feeds.gawker.com/lifehacker/full.xml

danmactough commented 10 years ago

No idea how exactly feedly does it, but there are several options:

  1. Analyze the items' enclosures -- some feeds use the enclosure property for a thumbnail image
  2. Analyze the items' media:* elements -- lots of feeds put thumbnail images there
  3. Parse the items' descriptions and extract and <img> tags
  4. Use a 3rd-party service like Embedly to get rich metadata (often including a thumbnail) about each item.

That gawker feed you appears to be getting parsed as expected, so I'm going to close this issue.

martinmurciego commented 4 years ago

I could see that in the Mailtrain tool that uses this module: it includes the featured image along with the summary when I use a Wordpress plugin to include it in the RSS Feed. `

El 18 de febrero se realizará la Jornada Internacional de Enfermería Familiar y Comunitaria
    <link>https://salud.misiones.gob.ar/el-18-de-febrero-se-realizara-la-jornada-internacional-de-enfermeria-familiar-y-comunitaria/</link>
            <pubDate>Mon, 10 Feb 2020 16:44:22 +0000</pubDate>
    <dc:creator><![CDATA[Jonatan]]></dc:creator>
            <category><![CDATA[Noticias]]></category>

    <guid isPermaLink="false">https://salud.misiones.gob.ar/?p=15464</guid>
            <description><![CDATA[<img width="150" height="150" src="https://salud.misiones.gob.ar/wp-content/uploads/2020/02/jornadas-internacional-de-enfermeria-1-150x150.jpg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" />El 18 de febrero, a partir de las 8 horas, se realizará la Jornada Internacional de Enfermería Familiar y Comunitaria en el salón auditorio de la Escuela de Enfermería de la UNaM. En este encuentro disertaran las Enfermeras Ana Romero García, Residente de Enfermería Familiar y Comunitaria de la Unidad Docente de Girona ICS- España. [&#8230;]]]></description>`