dimones / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

A feed makes feedparser.parse hang #363

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. parse('http://www.bizportal.co.il/shukhahon/messRss1.xml')
2. wait ...
3. wait ...

What is the expected output? What do you see instead?
Expected: A parsed feed.
Result: Script hangs.

What version of the product are you using? On what operating system?
Feedparser 5.1.2 with python 2.7.3 (also attempted with Python 2.6.3) on ubuntu 
12.04

Please provide any additional information below.
Running the same scenario on my Window 7 machine passes.

Original issue reported on code.google.com by asfalt...@gmail.com on 16 Jun 2012 at 7:27

GoogleCodeExporter commented 9 years ago
I'm not able to reproduce this issue with feedparser 5.1.2 on Python 2.7.2 or 
Python 2.6.6 on Ubuntu 10.10.

Original comment by kurtmckee on 18 Jun 2012 at 1:28

GoogleCodeExporter commented 9 years ago
Interesting, tried on another machine with Ubuntu 11.10 and Python 2.7.2 and 
worked fine too... But, the other machine still fails.

Also, the feed is in Hebrew, and the XML file defines encoding="Windows-1255", 
but the characters seem to be Unicode... Maybe the coding fallback fails? Any 
way to gracefully fail or force encoding?

If not, is there an easy way to trace where it fails?

Original comment by asfalt...@gmail.com on 18 Jun 2012 at 9:06

GoogleCodeExporter commented 9 years ago
I just tried this again using Python 2.7.3 on Ubuntu 12.04 and I'm not able to 
reproduce the issue.

Try hitting Ctrl+C and check the traceback that prints. It should give you an 
idea of where the hang is occurring. It's likely that the server isn't 
responding for whatever reason and because there's no timeout feedparser is 
happily waiting for an HTTP response it will never receive.

I recommend trying to request the feed separately from parsing the feed; you 
could perhaps use the `requests` library; I've heard good things about it, and 
it's a full HTTP client library and will probably support HTTP timeouts.

Original comment by kurtmckee on 28 Nov 2012 at 3:44