Closed GoogleCodeExporter closed 9 years ago
Same for rightload.info, a machine learning based personalized feed filter.
Backend falls asleep every couple of days because of this.
Original comment by picco...@gmail.com
on 16 Sep 2010 at 5:44
Original comment by adewale
on 26 Oct 2010 at 3:42
The problem with using the timeout argument is that it was only introduced in
Python 2.6. I remember seeing a helper library for doing this in older version
of Python, but I'm not finding it now.
Original comment by kurtmckee
on 4 Dec 2010 at 4:37
@Adewale: After looking through the Python documentation and considering how
such a feature might be implemented in feedparser, I don't believe that this
would be a worthwhile addition.
Unlike the other arguments to `parse()`, I don't see a pressing need for
individualized timeouts (as opposed to, say, User-Agent headers, which might
need to be modified on a case-by-case basis depending on whether a particular
server will filter access based on the User-Agent). It seems more reasonable to
expect that developers would want to set a global timeout, particularly since
Python's default is to never timeout.
However, developers have had the ability to set a global timeout for over seven
years by importing the socket library and setting the timeout in this way:
import socket
socket.setdefaulttimeout(<timeout in floating seconds>)
This is a feature that was introduced in Python 2.3 [1]. I think it would be
appropriate to allow developers to set the desired timeout using this existing
mechanism, rather than setting a global variable in feedparser and then relying
on feedparser to call `socket.setdefaulttimeout()` on the their behalves.
It's for these reasons that I think that this is an undesirable addition to
feedparser, and I recommend closing this bug report for these reasons.
[1]: http://docs.python.org/library/socket.html#socket.setdefaulttimeout
Original comment by kurtmckee
on 26 Dec 2010 at 2:06
I agree. The only safe way to do this would be to read the timeout value when
Feedparser is invoked, store it somewhere, set a new timeout and then every
exit path would have to reset the timeout value. However this still wouldn't be
threadsafe.
We can revisit this if and when Python offers a more fine-grained timeout
feature.
Original comment by adewale
on 26 Dec 2010 at 11:58
Issue 245 has been merged into this issue.
Original comment by adewale
on 3 Jan 2011 at 8:00
I run feedparser.parse() on website with few fastcgi processes, when users
concurrently submit bad feeds, my website will be lock.
I agree add a timeout parameter to parse() will mis-leading to people using
earlier versions of Python < 2.6, and this is an advanced feature that most
user don't care.
I suggest to add a constant 'TIMEOUT' to feedparser for advanced user that use
python >=2.6, when it's value is not None, uses timeout in urllib2.open()
Original comment by flytwoki...@gmail.com
on 4 Jan 2011 at 2:00
I'm sorry but the best solution for your issue is to add these 2 lines to your
codebase somewhere before you invoke feedparser:
import socket
socket.setdefaulttimeout(<timeout in floating seconds>)
Original comment by adewale
on 4 Jan 2011 at 2:58
urllib2.urlopen(url[, data][, timeout])
If they put timeout in the interface, feedparser should too. Plus I don't
believe people are OK with a) Providing a broken abstraction that forces people
to go two levels down (feedparser -> urllib2 -> socket) to set a global with
some boilerplate code documented only in a ticket b) Forcing every single
developer who is using feedparser for serious work to have their program go
dormant and debug and google their way all the way to this ticket. It took me
three days to get here, is that what we want feedparser users to go through?
Because if you fetch enough feeds, you will run into this.
Original comment by picco...@gmail.com
on 4 Jan 2011 at 5:19
@piccolbo: The `timeout` argument is only available in Python 2.6 and up. The
concern here is that, while it's possible to use a `try-except` block to
attempt to use the Python 2.6 `timeout` argument, it will introduce a
module-level variable that will be ignored in Python 2.4 and 2.5 (or worse,
feedparser will try setting the global timeout in those versions so that it
Just Works but will introduce a thread-safety issue while doing so). It's for
this reason that we're currently choosing to punt on the issue.
A patch introducing one of these problems while addressing your particular need
is trivially easy to write. What I want is a way to resolve both your needs as
well as the concerns outlined above.
Original comment by kurtmckee
on 4 Jan 2011 at 5:57
Both sides habe valid arguments and the discussion might last a while. That
being said, the first thing to do should be to emphasize this issue in the
feedparser documentation. This should happen in a way that it's hard to miss,
to spare others the hassle of googling the answer...
Original comment by moritzk...@gmail.com
on 4 Jan 2011 at 6:39
@kurtmckee I propose that we use the try except block and raise an exception if
timeout is not supported (python <2.6) instead of trying something ugly. It
doesn't get any worse for older pythons and paves the way for a future
(actually present) that just works and is a good abstraction. And incidentally
one can explain the situation to the developer in documenting the timeout
argument, so that incorporates also @moritzkrog suggestion and my point b) and
turn a lot of pain into at most disappointment and maybe a python upgrade.
Original comment by picco...@gmail.com
on 4 Jan 2011 at 6:59
@piccolbo one of the more popular uses of Feedparser is on AppEngine where
users are restricted to Python 2.5.
I'm strongly opposed to having arguments on methods that only work in certain
versions of Python.
If you feel strongly about it but don't want to write the 2 lines of code
needed to make the current solution work then I suggest you send in a patch to
provide a parse_with_timeout method which raises an exception with an explicit
error message if called in the wrong version of Python and sets the timeout
value in a thread-safe manner.
Original comment by adewale
on 6 Jan 2011 at 3:47
@adewale, Rightload has been fixed long ago using eventlet timeouts. I was just
trying to help. You can keep it broken if that rocks your boat.
Original comment by picco...@gmail.com
on 6 Jan 2011 at 4:48
Now that GAE recommends Python2.7:
- https://developers.google.com/appengine/docs/python/gettingstarted/
can this ticket be reopened?
Original comment by alessand...@redturtle.it
on 5 Feb 2013 at 3:44
No.
Original comment by kurtmckee
on 6 Feb 2013 at 12:19
Issue 406 has been merged into this issue.
Original comment by kurtmckee
on 10 Jul 2014 at 1:58
Hello, I got bitten by this and stumbled across the three tickets that pretty
much request the same thing. Given that it has been almost 4 years since the
first opening of this ticket and that pretty much nobody recommends python <2.7
(I have no numbers as to who requires python <2.6) it might be the time to
rethink python 2.4, 2.5 support.
In other news, I already went around and used urllib2 directly but I lost the
very nice support for ETag and Last-Modified feedparser has:
https://pythonhosted.org/feedparser/http-etag.html
which is... a shame.
Thanks!
Original comment by akosia...@gmail.com
on 3 Dec 2014 at 5:22
Original issue reported on code.google.com by
locher.h...@gmail.com
on 22 Jun 2010 at 8:33