aurelg / feedspora

FeedSpora posts RSS/Atom feeds to your social network accounts.
35 stars 5 forks source link

Error reading vimeo feed --> KeyError: 'medium' #73

Open Strubbl opened 4 years ago

Strubbl commented 4 years ago

Hi, i want to read my Vimeo feed with feedspora, but i get an error:

INFO:root:Found database file feedspora.db
INFO:root:Trying to read https://vimeo.com/strubbl/likes/rss as a file.
INFO:root:File not found.
INFO:root:Trying to read https://vimeo.com/strubbl/likes/rss as a URL.
INFO:root:Feed read.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/site-packages/feedspora/__main__.py", line 100, in <module>
    main()
  File "/usr/local/lib/python3.6/site-packages/feedspora/__main__.py", line 96, in main
    feedspora.run()
  File "/usr/local/lib/python3.6/site-packages/feedspora/feedspora_runner.py", line 243, in run
    entry_count = self._process_feed(entry_count, feed)
  File "/usr/local/lib/python3.6/site-packages/feedspora/feedspora_runner.py", line 209, in _process_feed
    for entry in entry_generator:
  File "/usr/local/lib/python3.6/site-packages/feedspora/generic_feed.py", line 276, in parse_rss
    fse.media_url = self.find_rss_image_url(entry, fse.link)
  File "/usr/local/lib/python3.6/site-packages/feedspora/generic_feed.py", line 210, in find_rss_image_url
    entry.find('media:content')['medium'] == 'image':
  File "/usr/local/lib/python3.6/site-packages/bs4/element.py", line 997, in __getitem__
    return self.attrs[key]
KeyError: 'medium'
manumacron commented 3 years ago

I got an issue that my be related. Everything worked fine with the first feed, but the 2 others stuck on the same issue :

Traceback (most recent call last):                                                                     
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main                                 
    return _run_code(code, main_globals, None,                                                         
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code                                            
    exec(code, run_globals)                                                                            
  File "/usr/local/lib/python3.8/dist-packages/feedspora/__main__.py", line 100, in <module>           
    main()                                                                                             
  File "/usr/local/lib/python3.8/dist-packages/feedspora/__main__.py", line 96, in main                
    feedspora.run()                                                                                    
  File "/usr/local/lib/python3.8/dist-packages/feedspora/feedspora_runner.py", line 243, in run        
    entry_count = self._process_feed(entry_count, feed)                                                
  File "/usr/local/lib/python3.8/dist-packages/feedspora/feedspora_runner.py", line 209, in _process_fe
ed                                                                                                     
    for entry in entry_generator:                                                                      
  File "/usr/local/lib/python3.8/dist-packages/feedspora/generic_feed.py", line 260, in parse_rss      
    fse.published_date = entry.find('pubdate').text                                                    
AttributeError: 'NoneType' object has no attribute 'text'                                              

This 2 feeds worked on the same CMS (spip). In the generic_feed.py file

# Link                                                          
fse.link = entry.find('link').text                              

# Content takes priority over Description                       

if entry.find('content'):                                       
    fse.content = entry.find('content')[0].text.strip()         
else:                                                           
    fse.content = entry.find('description').text.strip()        

# PubDate                                                       
fse.published_date = entry.find('pubdate').text                 

fse.tags = dict()                                               
# Tags from title and content, each in their own list           
fse.tags['title'], fse.tags['content'] = self.get_tag_lists(    
    fse.title, fse.content)                                     

# Add tags from category                                        
fse.tags['category'] = []                                       
for tag in entry.find_all('category'):                          
    new_tag = tag.text.replace(' ', '_').strip()                

    if new_tag not in fse.tags['category']:                     
        fse.tags['category'].append(new_tag)                    

As far as I understood, It may be related with missing data in the feed that is expected by feedspora. In my case, this publications have no dates