Closed GoogleCodeExporter closed 9 years ago
No, relative links are not allowed, see
http://www.feedvalidator.org/check.cgi?url=http%3A%2F%2Fwww.markenblog.de%2Ffeed
%2F
So GReader must be taking the server name from the <link> tag...
Original comment by and...@gmail.com
on 8 Aug 2008 at 8:22
Still, the feedvalidator confirms the feed is valid. It states that it's
recommended
to use absolute links since some feed readers cannot process relative ones.
I've been looking into the RSS specification, but it does not clarify matters.
Except
for stating that all RSS files have to be XML 1.0 compliant.
[http://www.w3.org/TR/REC-xml/#sec-external-ent XML 1.0] itself states that
"relative
URIs are relative to the location of the resource within which the entity
declaration
occurs". As far as I understand it, this means any and all relative URIs should
be
resolve to full URIs with the help of either the document's location or external
information.
Additionally, the (not really relevant) XML Base specification
(http://www.w3.org/TR/xmlbase/#syntax), states that unless an xml:base
attribute is
present, relative URIs should be resolved using the XML file's URL (in this
case the
feed file). The full sequence is shown in 4.1.
Original comment by eljo...@gmail.com
on 8 Aug 2008 at 9:04
Ok, let me give you a counterexample. Let's say we have two blogs, each blog
has one
blogpost, both post with relative URL. Now we have a agragating feed, that
inclides
both blogs, so it contains two articles. Now, the feed itself lies on different
URL,
and both articles are on different URLs. Thus, the location of the feed can not
be
used as the base, and the only thing we can use is the <link> tag, which
contains the
URL of the article.
My point here is, that taking the base URL from <link> is more robusts and is
more
probable to work in such insane situations. During the development, I had to do
couple of patches, that were fixing the problems of feed, simply because the
feed did
not followed standards. I like standard, but people (implementors) must follow
them
and not interpret them by themself. Nothing personal, just a general complain
:-)
Of course, the problem is reproducible, so it will be quite easy to fix it...
Original comment by and...@gmail.com
on 10 Aug 2008 at 6:11
Your example of an aggregator exposes the very weak spot of using the <link>
tag (and
the base url). The (very) well known feedburner feeds use a redirection URL as
the
item's <link> target. This means that using the <link> URL as a base will break
those
feeds (if they're already "broken" ;). Now, feedburner adds an additional tag,
<feedburner:origLink>, that could be used for a dirty workaround, but so far
the only
other reliable URL in those feeds would be the <link> tag of <channel>. This,
however, will not work with items joined from multiple feeds as well.
Fortunately I
did not encounter feedburner feeds with relative URLs.
So I guess whatever you choose to implement will break in once case or another
- were
it the <link> of the item, the <link> of the channel or the URI of the feed. I,
personally, agree: It would be really nice if every content provider would
adhere to
the recommendations, lacking a not-so-simple RSS specification. And I, too,
agree
that implementing this "feature" is an unpleasant workaround for a gap in the
spec.
BTW: I've found some more feeds that rewrite the <link> target to use an
off-site
"counter" service. Fortunately those do not use relative links within the HTML
code.
(E.g. slashdot, boingboing.net, both probably using a customized feedburner
service).
Original comment by eljo...@gmail.com
on 10 Aug 2008 at 6:31
One more thing I found is this:
http://cyber.law.harvard.edu/rss/relativeURI.html
Briefly, when xml:base is present, use it. If not, use /rss/channel/link for the
base, which I think is the way to go.
Original comment by and...@gmail.com
on 10 Aug 2008 at 7:31
Original comment by and...@gmail.com
on 11 Aug 2008 at 6:40
Did the fix, but the included HTML in http://www.markenblog.de/feed/ is
confusing
libsgml, so it is not parsed correctly. For now this feed is not working
correctly
until I fix the libsgml parser.
The problem is that there is no space between attributes, example:
<a href="http://www.ipeg.eu/blog/?p=304"target=_blank">
Original comment by and...@gmail.com
on 11 Aug 2008 at 10:19
I'm not sure whether this is expected: in r69 it tries to fetch URLs like
http:///wp-content/... - note the triple slash and the lack of the hostname. So
for
this feed it's not working.
Original comment by eljo...@gmail.com
on 25 Aug 2008 at 5:57
It is the result of the combination of libsgml + no-space-between-attrs as I
wrote in
#7. So the code is ok, but another bug which has to be fixed is in libsgml.
Original comment by and...@gmail.com
on 25 Aug 2008 at 1:16
Ah, so a third fallback to the feed's URL is not implemented (yet).
Original comment by eljo...@gmail.com
on 25 Aug 2008 at 1:27
If I'm right, this feed was fixed and there are no missing spaces between
attributes.
Thus I do not have to patch libsgml.
Original comment by and...@gmail.com
on 13 Sep 2008 at 5:53
Original issue reported on code.google.com by
eljo...@gmail.com
on 8 Aug 2008 at 8:02