Tallefer / pubsubhubbub

Automatically exported from code.google.com/p/pubsubhubbub
Other
0 stars 0 forks source link

Hub does not support all character encodings #19

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Example feed: http://feeds.feedburner.jp/junkblog

Top of the feed says: <?xml version="1.0" encoding="EUC-JP"?>

EUC-JP = http://en.wikipedia.org/wiki/Extended_Unix_Code

Background info from:
http://mail.python.org/pipermail/python-list/2008-January/646960.html

"> > Expat doesn't support as many encodings as Python does, and its repertoire
> > of encodings can't be extended; it supports UTF-8, UTF-16, ISO-8859-1
> > (Latin1), and ASCII. If encoding is given it will override the implicit or
> > explicit encoding of the document.
"

Original issue reported on code.google.com by bslatkin on 10 Jul 2009 at 6:28

GoogleCodeExporter commented 9 years ago
Here's another one!

unknown encoding: windows-874
Traceback (most recent call last):
  File "/base/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line
509, in __call__
    handler.post(*groups)
  File "/base/data/home/apps/pubsubhubbub/feed-ids.336696427949851964/main.py", line
319, in decorated
    return func(myself, *args, **kwargs)
  File "/base/data/home/apps/pubsubhubbub/feed-ids.336696427949851964/main.py", line
2554, in post
    feed_id = feed_identifier.identify(response.content, feed_type)
  File
"/base/data/home/apps/pubsubhubbub/feed-ids.336696427949851964/feed_identifier.p
y",
line 115, in identify
    parser.parse(data_stream)
  File "/base/python_dist/lib/python2.5/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/base/python_dist/lib/python2.5/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/base/python_dist/lib/python2.5/xml/sax/expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)
LookupError: unknown encoding: windows-874

Original comment by bslatkin on 30 Sep 2009 at 8:33

GoogleCodeExporter commented 9 years ago
Fixed a while ago; we now just drop these feeds with weird encodings on the 
ground.

Original comment by bslatkin on 3 Jun 2010 at 11:31