HaveF / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

UnicodeDecodeError xml parser #351

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. 
feedparser.parse('http://20minutos.feedsportal.com/c/32489/f/478284/index.rss')
2. error -> 'ascii' codec can't decode byte 0xc3 in position 37: ordinal not in 
range(128)
3.

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?
feedparser5.1.2 in ubuntu with django 1.3

Please provide any additional information below.
The string that could not be encoded/decoded was: tonom��as%2
Exception Location:     /usr/lib/python2.6/urlparse.py in urlunsplit, line 191

/usr/local/lib/python2.6/dist-packages/feedparser.py in _urljoin

▼ Local vars
Variable    Value
base    

u'http://20minutos.feedsportal.com/c/32489/f/478284/index.rss'

uri     

"/S&P's%20rebaja%20a%20nueve%20autonom\xc3\xadas%20su%20nota%20y%20deja%20al%20b
orde%20del%20'bono%20basura'%20a%20Catalu\xc3\xb1a%20y%20Baleares"

Original issue reported on code.google.com by raul.gon...@diffindo.es on 6 May 2012 at 10:57

GoogleCodeExporter commented 9 years ago
I'm not able to reproduce this. What exact version of Python 2.6 are you using? 
Would you start the interactive interpreter from the commandline, run the 
following commands, and copy and paste the output you get?

import sys
print "sys.version = %s" % (sys.version, )
import feedparser
print "feedparser.__version__ = %s" % (feedparser.__version__, )
print "feedparser.__file__ = %s" % (feedparser.__file__, )
f=feedparser.parse('http://20minutos.feedsportal.com/c/32489/f/478284/index.rss'
)

If that throws an exception, please copy and paste the entire traceback.

The exception isn't occurring in feedparser: it's occurring in urlparse.py, 
which is part of the Python standard library, and since I can't reproduce the 
problem I'll need more information to determine what's going on.

Original comment by kurtmckee on 6 May 2012 at 6:00

GoogleCodeExporter commented 9 years ago
Hi kurtmckee,

Thank you for your response. The exception ocurr when i try to parse the url i 
send you in my previous post and ocurr when feedparser try to parse a news link 
with accent and "ñ" character. In this moment, the rss source has not url with 
these character and then the feedparser working perfectly, but when the rss 
source has any of these character the exception is launched. Anyway I send you 
the info requested. 

sys.version = 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
[GCC 4.4.3]

feedparser.__version__ = 4.1

feedparser.__file__ = /usr/local/lib/python2.6/dist-packages/feedparser.pyc

And send you the xml data when exception ocurred.

Thanks in advance and best regards.

Original comment by raul.gon...@diffindo.es on 7 May 2012 at 11:42

Attachments:

GoogleCodeExporter commented 9 years ago
Hang on, you're actually running feedparser 4.1, which was released over six 
years ago! Please install the latest version of feedparser and re-run the 
script I included in my comment above.

Additionally, the xml_data.txt file you uploaded is backslash-escaped, so it's 
not an XML file.

Original comment by kurtmckee on 7 May 2012 at 2:14

GoogleCodeExporter commented 9 years ago
How is possible i´m running feedparser 4.1 when i installed the last version 
5.1.2 i downloaded in download section? i don´t understand.

How i can install the feedparser last version?

Thanks in advance.
best regards.

Original comment by raul.gon...@diffindo.es on 7 May 2012 at 2:38

GoogleCodeExporter commented 9 years ago
I´ve install again the feedparser version 5.1.2 and now i have this version 
installed.

I´ll try now with the last version.

thanks and Sorry for the inconvenience.
best regards.

Original comment by raul.gon...@diffindo.es on 7 May 2012 at 2:44

GoogleCodeExporter commented 9 years ago
This looks a lot like issue 303. Let me know if this is fixed with feedparser 
5.1.2 installed.

Original comment by kurtmckee on 7 May 2012 at 3:01

GoogleCodeExporter commented 9 years ago
Have you had an opportunity to try this with version 5.1.2 installed?

Original comment by kurtmckee on 10 May 2012 at 3:49

GoogleCodeExporter commented 9 years ago
I'm going to mark this as a duplicate of issue 303, but if you're still seeing 
this problem with version 5.1.2, please come back to this report and let me 
know! I'll need a URL to a site that demonstrates the problem. Thanks!

Original comment by kurtmckee on 18 May 2012 at 3:40