google-code-export / feedparser

Automatically exported from code.google.com/p/feedparser
Other
1 stars 0 forks source link

Proposal: support timeout #221

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Feedparser uses urllib2's open when parsing an http feed
2. It does not support a explicit timeout
3. You have to change the socket defaulttimeout...

--> When trying to parse an unresponding rss feed, the process 
         is blocked. 

What is the expected output? What do you see instead?

urlib2's open supports a timeout:

    def open(self, fullurl, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
        #implementation

suggestion:

    pass a given keywordargument "timeout" to urllib2, to overwrite the default
    socket._GLOBAL_DEFAULT_TIMEOUT

What version of the product are you using? On what operating system?

Feedparser 4.1
python 2.6
Mac OSX 10.6.4

Please provide any additional information below.

I've experienced the issue on a large Plone site where many rss portlets (which 
use the Feedparser) are used.

Original issue reported on code.google.com by locher.h...@gmail.com on 22 Jun 2010 at 8:33

GoogleCodeExporter commented 9 years ago
Same for rightload.info, a machine learning based personalized feed filter. 
Backend falls asleep every couple of days because of this.

Original comment by picco...@gmail.com on 16 Sep 2010 at 5:44

GoogleCodeExporter commented 9 years ago

Original comment by adewale on 26 Oct 2010 at 3:42

GoogleCodeExporter commented 9 years ago
The problem with using the timeout argument is that it was only introduced in 
Python 2.6. I remember seeing a helper library for doing this in older version 
of Python, but I'm not finding it now.

Original comment by kurtmckee on 4 Dec 2010 at 4:37

GoogleCodeExporter commented 9 years ago
@Adewale: After looking through the Python documentation and considering how 
such a feature might be implemented in feedparser, I don't believe that this 
would be a worthwhile addition.

Unlike the other arguments to `parse()`, I don't see a pressing need for 
individualized timeouts (as opposed to, say, User-Agent headers, which might 
need to be modified on a case-by-case basis depending on whether a particular 
server will filter access based on the User-Agent). It seems more reasonable to 
expect that developers would want to set a global timeout, particularly since 
Python's default is to never timeout.

However, developers have had the ability to set a global timeout for over seven 
years by importing the socket library and setting the timeout in this way:

    import socket
    socket.setdefaulttimeout(<timeout in floating seconds>)

This is a feature that was introduced in Python 2.3 [1]. I think it would be 
appropriate to allow developers to set the desired timeout using this existing 
mechanism, rather than setting a global variable in feedparser and then relying 
on feedparser to call `socket.setdefaulttimeout()` on the their behalves.

It's for these reasons that I think that this is an undesirable addition to 
feedparser, and I recommend closing this bug report for these reasons.

[1]: http://docs.python.org/library/socket.html#socket.setdefaulttimeout

Original comment by kurtmckee on 26 Dec 2010 at 2:06

GoogleCodeExporter commented 9 years ago
I agree. The only safe way to do this would be to read the timeout value when 
Feedparser is invoked, store it somewhere, set a new timeout and then every 
exit path would have to reset the timeout value. However this still wouldn't be 
threadsafe.

We can revisit this if and when Python offers a more fine-grained timeout 
feature.

Original comment by adewale on 26 Dec 2010 at 11:58

GoogleCodeExporter commented 9 years ago
Issue 245 has been merged into this issue.

Original comment by adewale on 3 Jan 2011 at 8:00

GoogleCodeExporter commented 9 years ago
I run feedparser.parse() on website with few fastcgi processes, when users 
concurrently submit bad feeds, my website will be lock.

I agree add a timeout parameter to parse() will mis-leading to people using 
earlier versions of Python < 2.6, and this is an advanced feature that most 
user don't care.

I suggest to add a constant 'TIMEOUT' to feedparser for advanced user that use 
python >=2.6, when it's value is not None, uses timeout in urllib2.open()

Original comment by flytwoki...@gmail.com on 4 Jan 2011 at 2:00

GoogleCodeExporter commented 9 years ago
I'm sorry but the best solution for your issue is to add these 2 lines to your 
codebase somewhere before you invoke feedparser:
import socket
socket.setdefaulttimeout(<timeout in floating seconds>)

Original comment by adewale on 4 Jan 2011 at 2:58

GoogleCodeExporter commented 9 years ago
urllib2.urlopen(url[, data][, timeout])

If they put timeout in the interface, feedparser should too. Plus I don't 
believe people are OK with a) Providing a broken abstraction that forces people 
to go two levels down (feedparser -> urllib2 -> socket) to set a global with 
some boilerplate code documented only in a ticket b) Forcing every single 
developer who is using feedparser for serious work to have their program go 
dormant and debug and google their way all the way to this ticket. It took me 
three days to get here, is that what we want feedparser users to go through? 
Because if you fetch enough feeds, you will run into this.

Original comment by picco...@gmail.com on 4 Jan 2011 at 5:19

GoogleCodeExporter commented 9 years ago
@piccolbo: The `timeout` argument is only available in Python 2.6 and up. The 
concern here is that, while it's possible to use a `try-except` block to 
attempt to use the Python 2.6 `timeout` argument, it will introduce a 
module-level variable that will be ignored in Python 2.4 and 2.5 (or worse, 
feedparser will try setting the global timeout in those versions so that it 
Just Works but will introduce a thread-safety issue while doing so). It's for 
this reason that we're currently choosing to punt on the issue.

A patch introducing one of these problems while addressing your particular need 
is trivially easy to write. What I want is a way to resolve both your needs as 
well as the concerns outlined above.

Original comment by kurtmckee on 4 Jan 2011 at 5:57

GoogleCodeExporter commented 9 years ago
Both sides habe valid arguments and the discussion might last a while. That 
being said, the first thing to do should be to emphasize this issue in the 
feedparser documentation. This should happen in a way that it's hard to miss, 
to spare others the hassle of googling the answer...

Original comment by moritzk...@gmail.com on 4 Jan 2011 at 6:39

GoogleCodeExporter commented 9 years ago
@kurtmckee I propose that we use the try except block and raise an exception if 
timeout is not supported (python <2.6) instead of trying something ugly. It 
doesn't get any worse for older pythons and paves the way for a future 
(actually present) that just works and is a good abstraction. And incidentally 
one can explain the situation to the developer in documenting the timeout 
argument, so that incorporates also @moritzkrog suggestion and my point b) and 
turn a lot of pain into at most disappointment and maybe a python upgrade.

Original comment by picco...@gmail.com on 4 Jan 2011 at 6:59

GoogleCodeExporter commented 9 years ago
@piccolbo one of the more popular uses of Feedparser is on AppEngine where 
users are restricted to Python 2.5.
I'm strongly opposed to having arguments on methods that only work in certain 
versions of Python.

If you feel strongly about it but don't want to write the 2 lines of code 
needed to make the current solution work then I suggest you send in a patch to 
provide a parse_with_timeout method which raises an exception with an explicit 
error message if called in the wrong version of Python and sets the timeout 
value in a thread-safe manner.

Original comment by adewale on 6 Jan 2011 at 3:47

GoogleCodeExporter commented 9 years ago
@adewale, Rightload has been fixed long ago using eventlet timeouts. I was just 
trying to help. You can keep it broken if that rocks your boat.

Original comment by picco...@gmail.com on 6 Jan 2011 at 4:48

GoogleCodeExporter commented 9 years ago
Now that GAE recommends Python2.7:
 - https://developers.google.com/appengine/docs/python/gettingstarted/
can this ticket be reopened?

Original comment by alessand...@redturtle.it on 5 Feb 2013 at 3:44

GoogleCodeExporter commented 9 years ago
No.

Original comment by kurtmckee on 6 Feb 2013 at 12:19

GoogleCodeExporter commented 9 years ago
Issue 406 has been merged into this issue.

Original comment by kurtmckee on 10 Jul 2014 at 1:58

GoogleCodeExporter commented 9 years ago
Hello, I got bitten by this and stumbled across the three tickets that pretty 
much request the same thing. Given that it has been almost 4 years since the 
first opening of this ticket and that pretty much nobody recommends python <2.7 
(I have no numbers as to who requires python <2.6) it might be the time to 
rethink python 2.4, 2.5 support.

In other news, I already went around and used urllib2 directly but I lost the 
very nice support for ETag and Last-Modified feedparser has:

https://pythonhosted.org/feedparser/http-etag.html

which is... a shame.

Thanks!

Original comment by akosia...@gmail.com on 3 Dec 2014 at 5:22