ecprice / newsdiffs

Automatic scraper that tracks changes in news articles over time.
Other
497 stars 135 forks source link

TypeError: sequence item 0: expected string or Unicode, NoneType found #35

Open ryantate opened 9 years ago

ryantate commented 9 years ago

Poking around to build my own parser, I followed the tips in README, and keep getting TypeErrors when I attempt to run the BBC parser on a story, and ditto for CNN. (Others seem fine, although Tagesschau returns no story URLs from test_parser.py tagesschau.TagesschauParser.)

For BBC, which is the one used in the README, I tried it with the URL from README and with a fresh URL fetched by test_parser.py bbc.BBCParser, same error either way:

ryantate@ryantate:~/dist/python/newsdiffs$ python parsers/test_parser.py bbc.BBCParser http://www.bbc.co.uk/news/uk-21649494
Traceback (most recent call last):
  File "parsers/test_parser.py", line 29, in <module>
    print unicode(parsed_article)
  File "/home/ryantate/dist/python/newsdiffs/parsers/baseparser.py", line 138, in __unicode__
    self.body,)))
TypeError: sequence item 0: expected string or Unicode, NoneType found
ryantate@ryantate:~/dist/python/newsdiffs$ python parsers/test_parser.py bbc.BBCParser http://www.bbc.co.uk/news/technology-34044506
Traceback (most recent call last):
  File "parsers/test_parser.py", line 29, in <module>
    print unicode(parsed_article)
  File "/home/ryantate/dist/python/newsdiffs/parsers/baseparser.py", line 138, in __unicode__
    self.body,)))
TypeError: sequence item 0: expected string or Unicode, NoneType found
ryantate@ryantate:~/dist/python/newsdiffs$ 

CNN:

ryantate@ryantate:~/dist/python/newsdiffs$ python parsers/test_parser.py cnn.CNNParser http://edition.cnn.com/2015/08/24/sport/vincenzo-nibali-tour-of-spain/index.html
Traceback (most recent call last):
  File "parsers/test_parser.py", line 29, in <module>
    print unicode(parsed_article)
  File "/home/ryantate/dist/python/newsdiffs/parsers/baseparser.py", line 138, in __unicode__
    self.body,)))
TypeError: sequence item 0: expected string or Unicode, NoneType found