dylang / node-rss

RSS feed generator for Node.
https://npmjs.org/package/rss
MIT License
997 stars 133 forks source link

How to remove CDATA from title and description? (stitcherFM and w3validator) #97

Open vincentntang opened 3 years ago

vincentntang commented 3 years ago

how do I remove the CDATA from here? Here's the RSS feed we're using https://www.codechefs.dev/rss.xml

<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0">
<channel>
<title>
<![CDATA[ Code Chefs - Hungry Web Developer Podcast ]]>
</title>
<description>
<![CDATA[ Looking to expand your skills as a Web Developer? Vincent Tang and German Gamboa break down topics in Javascript, NodeJS, CSS, DevOps, AWS, and career development! ]]>
</description>

I'm running into issues getting this feed into getting this through https://www.stitcher.com/ which uses https://validator.w3.org/

I'm using GatsbyJS which is using node-rss behind the scenes for the gatsby-plugin-feed

I read elsewhere that if I remove the CDATA for title, it should fix RSS feed issues. https://github.com/dylang/node-rss/issues/71

williamwgant commented 3 years ago

I'm seeing the same issue. Did you have any luck sorting it out?

dmythro commented 3 years ago

Just passing by... but have no idea how removing CDATA should fix validation as it is already valid, and it is a standard approach and works like that for ages (it was the same like 20 years ago). Are you sure it is the actual problem?

williamwgant commented 3 years ago

To be honest, I'm not sure. I've seen different things on this, and the validator I just tried to use didn't call it out. So maybe it's ok?

The validator did find a lot of other dumb stuff I did (with more complicated issues), so I would expect that it is ok. But I haven't tried pushing it into stitcher yet, as I'm moving an existing podcast feed.

dmythro commented 3 years ago

A long time ago I wrote a specialised CMS, with RSS feeds, using PHP. And all the output, including RSS, was manual back in time. All the plain text/html was inside CDATA and it still works like that, no issues with Feedly or other stuff I used over time. So if validator highlights it — it's weird. Checked the feed with W3C validator and no issues with CDATA. There are others already as it's outdated a bit, but that's alright :)

tazwar9t63 commented 2 years ago

how did u solve it ? I'm facing the same issue @vincentntang

FerrariAndrea commented 1 year ago

Same issue here, some RSS validators don't like the ![CDATA[. (For now, I'm testing the string output, not the XML exposing it from a server, at the end of that message you will understand why I underlined that)

I think that if for my scope I will need to remove them, I will build an algorithm that will find and remove ![CDATA[ from the output string format (should be easy to do).

Something like:

  const rss_data =feed.xml({ indent: true }).replaceAll("&", "&amp;");
  let offset = 0;
  let buffer = "";
  const skip_item=false;
  let eof_targeth= rss_data.length;
  if(skip_item){
    eof_targeth=rss_data.indexOf("<item>");    
  }
  while(offset<eof_targeth){
    const start_i_title = rss_data.indexOf("<title>",offset);
    const start_i_desc = rss_data.indexOf("<description>",offset);
    let start_i =-1;
    let xml_tag;
    if(start_i_title>start_i_desc){
      start_i=start_i_desc;
      xml_tag="</description>";
    }else{
      start_i=start_i_title;
      xml_tag="</title>";
    }
    if(start_i>-1 &&start_i<eof_targeth ){
      const end_i = rss_data.indexOf(xml_tag,start_i+1);
      const text = rss_data.substring(start_i,end_i);
      const cleanned = text.replaceAll("<![CDATA[","").replaceAll("]]>","");
      buffer+=rss_data.substring(offset,start_i)+cleanned;
      offset=end_i;
    }else{
      buffer+=rss_data.substring(offset,rss_data.length);
      offset=eof_targeth;
    }
  }

Guys be careful with special chars, if you remove <![CDATA[, you need to sanitize the string, as I did in the algorithm for the "&" with .replaceAll("&", "&amp;");. I still need to study the standard of RSS feed, I think that the validators are wrong, not that repo. For example this one: https://www.rssfeedexpert.com/ToolsRSSFeedXMLFormatter.aspx I noted that if you validate the normal output text of "feed.xml" here the validator will write: This is NOT a valid XML document But if you edit the text by adding a space for example somewhere and then click on "collect" it will show: "Success - See reformatted text below" and it parses the XML correctly :sweat:

I will update you when I will test the RSS output directly from the URL instead validate it from the string output.

Update: I'm using Google News as a reader of RSS output directly from the URL and here all is fine.

Dentrax commented 1 month ago

Any update on this?