HaveF / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

Cannot get links from some of the items #317

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

Use feedparser-5.1 to show the link in the first item of the attached rss file.

What is the expected output? What do you see instead?

Instead of
http://www.zaman.com.tr/haber.do?haberno=1214608
given output is
>>> a['items'][0]['link']
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/root/feedparser-5.1/feedparser/feedparser.py", line 346, in
__getitem__
return dict.__getitem__(self, key)
KeyError: 'link'

What version of the product are you using? On what operating system?

feedparser-5.1 on a 64 bit Redhat 5.5 server and 
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
# rpm -qa python
python-2.4.3-27.el5

Same results got also on 64 bit Windows 7 SP1 with Python 2.7.2

Please provide any additional information below.

I created the rss file via copying the source content via right click + View 
Page Source + Select All and then copying to a text file. Maybe I've done 
something wrong while copying since the items which cannot be parsed via 
feedparser also cannot be folded correctly in the Notepad++ (set language as 
HTML in Notepad++) while other items can be done.

Output on Redhat server

# python
Python 2.4.3 (#1, Jun 11 2009, 14:09:37)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> a=feedparser.parse("a.rss")
>>> a['items'][1]['link']
u'http://www.zaman.com.tr/haber.do?haberno=1214609'
>>> a['items'][0]['link']
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "feedparser.py", line 346, in __getitem__
    return dict.__getitem__(self, key)
KeyError: 'link'
>>> a['items'][13]['link']
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "feedparser.py", line 346, in __getitem__
    return dict.__getitem__(self, key)
KeyError: 'link'
>>> a['items'][12]['link']
u'http://www.zaman.com.tr/haber.do?haberno=1214625'

Output on Windows PC

>python
Python 2.7.2 (default, Jun 12 2011, 14:24:46) [MSC v.1500 64 bit (AMD64)] on 
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> a=feedparser.parse("a.rss")
>>> a['items'][1]['link']
u'http://www.zaman.com.tr/haber.do?haberno=1214609'
>>> a['items'][0]['link']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "feedparser.py", line 346, in __getitem__
    return dict.__getitem__(self, key)
KeyError: 'link'
>>> a['items'][13]['link']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "feedparser.py", line 346, in __getitem__
    return dict.__getitem__(self, key)
KeyError: 'link'
>>> a['items'][12]['link']
u'http://www.zaman.com.tr/haber.do?haberno=1214625'

Thanks,
Altaz

Original issue reported on code.google.com by fresk...@gmail.com on 14 Dec 2011 at 8:34

Attachments:

GoogleCodeExporter commented 9 years ago
This is happening because the feed is not wellformed. In particular, the very 
first item contains the following line:

    <title>Fenerbahçde Emre iç kritik gütle>

Based on the XML around it, that line probably should have looked something 
like this:

    <title>Fenerbahçde Emre iç kritik gü...</title>

Unfortunately, because the feed is not wellformed, it's impossible to know 
exactly what the feed publisher meant. Feedparser tries to recover from the 
error, but the result is imperfect (which in this case means that the first 
entry's permalink isn't available).

If your software crashed because of this, you can check for the existence of 
the key before accessing it:

    if 'link' in a['items'][0]:
        # do something with the permalink

The problem lies with the feed publishing software, and I don't see a better 
way to recover from the error, so I'm going to close this report.

Original comment by kurtmckee on 15 Dec 2011 at 8:41