ankurpiyush26 / pubsubhubbub

Automatically exported from code.google.com/p/pubsubhubbub
Other
1 stars 0 forks source link

Hub does not handle external XML entities gracefully #116

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Could not get entries for content of 98994 bytes in format "atom" for topic 
u'http://blip.fm/feed/all.atomextended':
Traceback (most recent call last):
  File "/base/data/home/apps/pubsubhubbub/memory-queue.342978944075695396/main.py", line 2469, in parse_feed
    feed_record.topic, format, content)
  File "/base/data/home/apps/pubsubhubbub/memory-queue.342978944075695396/main.py", line 2317, in find_feed_updates
    header_footer, entries_map = filter_feed(feed_content, format)
  File "/base/data/home/apps/pubsubhubbub/memory-queue.342978944075695396/feed_diff.py", line 245, in filter
    parser.parse(data_stream)
  File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/expatreader.py", line 211, in feed
    self._err_handler.fatalError(exc)
  File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/handler.py", line 38, in fatalError
    raise exception
SAXParseException: <unknown>:132:38: undefined entity

Original issue reported on code.google.com by bslatkin on 28 Jun 2010 at 6:04

GoogleCodeExporter commented 9 years ago
Here's an example entity that crept in there (Ã)

    <item>
        <title>RD9: European GP (Valencia) - race results</title>
        <description>(F1Complete.com) 2010 FORMULA 1 TELEFÃ?NICA GRAND PRIX OF EUROPE 2010 - Race Date: Sunday 27 June </description>
        <link>http://www.totalf1.com/details/view/344866/</link>
        <guid>http://www.totalf1.com/details/view/344866/</guid>
        <pubDate>Sun, 27 Jun 2010 20:30:09 GMT</pubDate>
    </item>

Original comment by bslatkin on 28 Jun 2010 at 6:08

GoogleCodeExporter commented 9 years ago
Python's ElementTree doesn't give me many options here, and I'm inclined not to 
fix this, since technically it's bad XML. Luckily, r392 adds arbitrary content 
support, which causes unparsable feeds like this one to just pass through the 
hub unmodified without any feed entry diffing whatsoever.

Original comment by bslatkin on 6 Nov 2010 at 1:17