HaveF / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

Parsing from string not works anymore #332

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
2. feedparser.parse(response.content)

What version of the product are you using? On what operating system?
5.1, python 2.5.2, Google App Engine

Please provide any additional information below.
You have bug in code which not allow to parse string to feed - :

Stack:

1. def parse(url_file_stream_or_string, etag=None, modified=None, agent=None, 
referrer=None, handlers=None, request_headers=None, response_headers=None):
...
        f = _open_resource(url_file_stream_or_string, etag, modified, agent, referrer, handlers, request_headers)
...

2. def _open_resource(url_file_stream_or_string, etag, modified, agent, 
referrer, handlers, request_headers):
...
# HERE IS BUG! (Unhandled exception)
    try:
        return open(url_file_stream_or_string, 'rb')
    except IOError:
        pass

...

It not throws IOError - you should validate parameters not use exceptions to 
validate since it not designed for validation - it is quick risky way but 
slowing development:

{'bozo': 1,
 'bozo_exception': TypeError('_getfullpathname() argument 1 must be (buffer overflow), not str',),
 'entries': [],
 'feed': {}}

Original issue reported on code.google.com by Cezary.W...@gmail.com on 25 Feb 2012 at 8:33

GoogleCodeExporter commented 9 years ago
it is affected by this function os.path.isfile(url_file_stream_or_string) it 
need some check if string is not too long.

BTW it is unsecure backdoor - remote attacker could modify rss to hack site 
(you could read any file from file system sending name of file)!

Original comment by Cezary.W...@gmail.com on 25 Feb 2012 at 8:50

GoogleCodeExporter commented 9 years ago
Some workaround (response.content is feed str):
- instead:
      feed = feedparser.parse(response.content)
- do that:
      feed = feedparser.parse(StringIO(response.content))

Original comment by Cezary.W...@gmail.com on 25 Feb 2012 at 8:56

GoogleCodeExporter commented 9 years ago
I'm not able to recreate the issue you're seeing. Python 2.5.2 was released 
four years ago, and the only information I can find online seems to suggest 
this is a Windows-specific issue. That could be why I'm not able to recreate 
this problem in Python 2.5.6 on Linux. Are you running GAE on a Windows 
computer, or are you running an application on Google's App Engine servers?

You're right, using an external library download the feed and sending the 
string to feedparser can create a security issue (though the problem is 
mitigated by the fact that feedparser will still have to be able to parse the 
file enough to return meaningful information that could get presented to the 
malicious user). However, fixing this may require a behavioral change, so I'm 
considering whether to release the fix in the 5.1.1 release or not.

Original comment by kurtmckee on 27 Feb 2012 at 4:17

GoogleCodeExporter commented 9 years ago
Try to make very long feed string i.e. - it will make file name overflow - 
since file name is limited - if I remember even in linux. Python 2.5.2 is used 
by Google App Engine (farm is running rather on linux).

Try this RSS -> http://www.pracuj.pl/praca/rss.aspx?SE=0&R=7

Security change need some architecture change - you could mark it i.e. as as 
deprecated  with comment that is unsecure - it will suggest me to not use it 
anymore ...

Original comment by Cezary.W...@gmail.com on 29 Feb 2012 at 7:36