Open matthew-carroll opened 5 months ago
I think the existing RSS 1.0 data model is incorrect. Here's an RSS 1.0 basic example from the test directory:
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/"
>
<channel rdf:about="http://www.xml.com/xml/news.rss">
<title>XML.com</title>
<link>http://xml.com/pub</link>
<description>XML.com features a rich mix of information and services for the XML community.</description>
<image rdf:resource="http://xml.com/universal/images/xml_tiny.gif"/>
<items>
<rdf:Seq>
<rdf:li resource="http://xml.com/pub/2000/08/09/xslt/xslt.html"/>
<rdf:li resource="http://xml.com/pub/2000/08/09/rdfdb/index.html"/>
</rdf:Seq>
</items>
<textinput rdf:resource="http://search.xml.com"/>
</channel>
<image rdf:about="http://xml.com/universal/images/xml_tiny.gif">
<title>XML.com</title>
<link>http://www.xml.com</link>
<url>http://xml.com/universal/images/xml_tiny.gif</url>
</image>
<item rdf:about="http://xml.com/pub/2000/08/09/xslt/xslt.html">
<title>Processing Inclusions with XSLT</title>
<link>http://xml.com/pub/2000/08/09/xslt/xslt.html</link>
<description>Processing document inclusions with general XML tools can be problematic. This article proposes a way of preserving inclusion information through SAX-based processing.</description>
</item>
<item rdf:about="http://xml.com/pub/2000/08/09/rdfdb/index.html">
<title>Putting RDF to Work</title>
<link>http://xml.com/pub/2000/08/09/rdfdb/index.html</link>
<description>
Tool and API support for the Resource Description Framework
is slowly coming of age. Edd Dumbill takes a look at RDFDB,
one of the most exciting new RDF toolkits.
</description>
</item>
<textinput rdf:about="http://search.xml.com">
<title>Search XML.com</title>
<description>Search XML.com's XML collection</description>
<name>s</name>
<link>http://search.xml.com</link>
</textinput>
</rdf:RDF>
Here's the spec for RSS 1.0: https://validator.w3.org/feed/docs/rss1.html#s5.5
Yet, here's the property list from rss1_feed.dart
:
final String? title;
final String? description;
final String? link;
final String? image;
final List<Rss1Item> items;
final UpdatePeriod? updatePeriod;
final int? updateFrequency;
final DateTime? updateBase;
final DublinCore? dc;
The parsing behavior is as follows:
final document = XmlDocument.parse(xmlString);
XmlElement rdfElement;
try {
rdfElement = document.findAllElements('rdf:RDF').first;
} on StateError {
throw ArgumentError('channel not found');
}
final channel = rdfElement.findElements('channel');
return Rss1Feed(
title: findElementOrNull(rdfElement, 'title')?.innerText,
link: findElementOrNull(rdfElement, 'link')?.innerText,
description: findElementOrNull(rdfElement, 'description')?.innerText,
items: rdfElement.findElements('item').map((element) => Rss1Item.parse(element)).toList(),
image: findElementOrNull(rdfElement, 'image')?.getAttribute('rdf:resource'),
updatePeriod: _parseUpdatePeriod(
findElementOrNull(rdfElement, 'sy:updatePeriod')?.innerText,
),
updateFrequency: parseInt(
findElementOrNull(rdfElement, 'sy:updateFrequency')?.innerText,
),
updateBase: parseDateTime(
findElementOrNull(rdfElement, 'sy:updateBase')?.innerText,
),
dc: channel.isEmpty ? null : DublinCore.parse(rdfElement.findElements('channel').first),
);
We can see that this object parses the whole document, so it should capture enough information to recover the document, but it doesn't.
We can see that the parser pulls the title
, description
and link
from the top-level RDF element, as it should.
We can see that the parse collects and parses all the top-level item
s within the RDF element, as it should.
However, the top-level image
is reduced to a single attribute, despite the fact that the image
can contain a title
, link
, and url
. So we seem to be losing information. Based on a quick check of the spec, it looks like this parser might be confusing two different images. There's an image
element under the RDF
element, which is the one we want. Then there's an image
element under the channel
element. This parser is treating the image
like a channel
version, but it should be treating it like an RDF
element.
Also, the textinput
top-level element isn't parsed at all, despite being a part of the specification.
We need to fix the RSS 1.0 data model before serializing it. Blocked on #47
This package currently includes many data structures that are parsed from RSS (and RSS extension) XML. However, this behavior only exists as parsing.
Add serialization for RSS 1.0 documents (not RSS 2.0 or Atom).