HaveF / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

MemoryError exception on some non feed page #353

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

When trying to parse this URL: http://bitspace.dyndns.org/ python ended up with 
this MemoryError exception:

Traceback (most recent call last):
  File "./enable_working_channels_marked_as_failed.py", line 236, in <module>
    main()
  File "./enable_working_channels_marked_as_failed.py", line 222, in main
    if is_channel_reachable(conn, bad_channel_id):
  File "./enable_working_channels_marked_as_failed.py", line 160, in is_channel_reachable
    feed_resource = feedparser.parse(url_to_check)
  File "/usr/local/lib/python2.7/dist-packages/feedparser.py", line 3926, in parse
    proposed_encoding = unicode(chardet.detect(data)['encoding'], 'ascii', 'ignore')
  File "/usr/lib/python2.7/dist-packages/chardet/__init__.py", line 24, in detect
    u.feed(aBuf)
  File "/usr/lib/python2.7/dist-packages/chardet/universaldetector.py", line 115, in feed
    if prober.feed(aBuf) == constants.eFoundIt:
  File "/usr/lib/python2.7/dist-packages/chardet/charsetgroupprober.py", line 59, in feed
    st = prober.feed(aBuf) 
  File "/usr/lib/python2.7/dist-packages/chardet/sjisprober.py", line 53, in feed
    for i in range(0, aLen):
MemoryError

Original issue reported on code.google.com by analo...@gmail.com on 7 May 2012 at 2:47

GoogleCodeExporter commented 9 years ago
This appears to be an issue with chardet, which unfortunately hasn't been 
maintained for four years. Try uninstalling chardet.

Original comment by kurtmckee on 7 May 2012 at 3:07

GoogleCodeExporter commented 9 years ago
I just did, will open a new bug report if it happens again.
Thanks!

Original comment by analo...@gmail.com on 7 May 2012 at 3:35

GoogleCodeExporter commented 9 years ago
I tried downloading the file to test it myself, and I received a binary 
Shoutcast stream. The server could be sniffing the user agent to decide whether 
to serve an HTML page or a Shoutcast stream. Anyway, I killed the download 
after about 5MB, but it's likely that this kind of page will consume all of the 
available memory. It may be worthwhile to use an external HTTP client library 
that can protect your application from this kind of behavior. I don't have any 
recommendations, but you'll probably want to limit the size of the downloaded 
file.

Original comment by kurtmckee on 10 May 2012 at 3:55