JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.43k stars 706 forks source link

AttributeError on extracting tweet date #35

Closed jodizzle closed 5 years ago

jodizzle commented 5 years ago

Here's the error, from the tail end of a log file with increased verbosity:

2019-05-02 00:33:05.481  INFO  snscrape.modules.twitter  Retrieving scroll page TWEET-1103001670152269824-1123744738421747713
2019-05-02 00:33:05.482  INFO  snscrape.base  Retrieving https://twitter.com/i/search/timeline?f=tweets&vertical=default&lang=en&q=%23MaduroRegime&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&qf=off&max_position=TWEET-1103001670152269824-1123744738421747713
2019-05-02 00:33:05.482  DEBUG  snscrape.base  ... with headers: {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.2584.142 Safari/537.36'}
2019-05-02 00:33:05.953  DEBUG  urllib3.connectionpool  https://twitter.com:443 "GET /i/search/timeline?f=tweets&vertical=default&lang=en&q=%23MaduroRegime&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&qf=off&max_position=TWEET-1103001670152269824-1123744738421747713 HTTP/1.1" 200 19292
2019-05-02 00:33:05.956  DEBUG  snscrape.base  https://twitter.com/i/search/timeline?f=tweets&vertical=default&lang=en&q=%23MaduroRegime&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&qf=off&max_position=TWEET-1103001670152269824-1123744738421747713 retrieved successfully
Traceback (most recent call last):
  File "/home/user/env/bin/snscrape", line 11, in <module>
    load_entry_point('snscrape==0.1.3', 'console_scripts', 'snscrape')()
  File "/home/user/env/lib/python3.7/site-packages/snscrape-0.1.3-py3.7.egg/snscrape/cli.py", line 83, in main
    for i, item in enumerate(scraper.get_items(), start = 1):
  File "/home/user/env/lib/python3.7/site-packages/snscrape-0.1.3-py3.7.egg/snscrape/modules/twitter.py", line 83, in get_items
    yield from self._feed_to_items(feed)
  File "/home/user/env/lib/python3.7/site-packages/snscrape-0.1.3-py3.7.egg/snscrape/modules/twitter.py", line 38, in _feed_to_items
    date = datetime.datetime.fromtimestamp(int(tweet.find('a', 'tweet-timestamp').find('span', '_timestamp')['data-time']), datetime.timezone.utc)
AttributeError: 'NoneType' object has no attribute 'find'

The hashtag being collected was 'MaduroRegime'. It also seems to be reproducible, at least on my end, and at least within a short time frame.

Fusl commented 5 years ago

twitter-hashtag strache, twitter-search "strache video"

2019-05-18 08:04:46.241  CRITICAL  snscrape.cli  Local variables logged to /tmp/snscrape_locals_3n8bef1r
Traceback (most recent call last):
  File "/usr/local/bin/snscrape", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/snscrape/cli.py", line 111, in main
    for i, item in enumerate(scraper.get_items(), start = 1):
  File "/usr/local/lib/python3.7/site-packages/snscrape/modules/twitter.py", line 115, in get_items
    yield from self._feed_to_items(feed)
  File "/usr/local/lib/python3.7/site-packages/snscrape/modules/twitter.py", line 51, in _feed_to_items
    date = datetime.datetime.fromtimestamp(int(tweet.find('a', 'tweet-timestamp').find('span', '_timestamp')['data-time']), datetime.timezone.utc)
AttributeError: 'NoneType' object has no attribute 'find'
JustAnotherArchivist commented 5 years ago

This should be fixed by 7989af27. At least I can't reproduce it with the examples given. Please reopen if it reoccurs.