Mottl / GetOldTweets3

A Python 3 library and a corresponding command line utility for accessing old tweets
MIT License
365 stars 127 forks source link

Download stops after a lot of tweets #3

Open JaimeBadiola opened 5 years ago

JaimeBadiola commented 5 years ago

I tried to download tweets with guery-search 'bitcoin' since 2018-02-18 until 2018-02-19. The issue is that the script stoped before the end of the until parameter

The log was too big to put it all, so I deleted the log of the first 31000 tweets.

You can find the log here

Can this be because twitter detects a bot downloading a lot of tweets?

lprayaga commented 4 years ago

I am having trouble with Getoldtweets3 on my mac. I can install it ans run the command: Getoldtweets3 - h. and get all the examples

BUt if I try any other command like Getoldtweets3 --querysearch "GetOldTweets3 --querysearch "bitcoin" --lang cn --maxtweets 10

then I cannot get it to work. It was working until today, I made no changes, but getting this error: If anyone has ideas please share

Downloading tweets... Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyquery/pyquery.py", line 96, in fromstring result = getattr(etree, meth)(context) File "src/lxml/etree.pyx", line 3222, in lxml.etree.fromstring File "src/lxml/parser.pxi", line 1877, in lxml.etree._parseMemoryDocument File "src/lxml/parser.pxi", line 1758, in lxml.etree._parseDoc File "src/lxml/parser.pxi", line 1068, in lxml.etree._BaseParser._parseUnicodeDoc File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError File "", line 2 lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 2, column 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.8/bin/GetOldTweets3", line 209, in main got.manager.TweetManager.getTweets(tweetCriteria, receiveBuffer, debug=debug) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/GetOldTweets3/manager/TweetManager.py", line 70, in getTweets scrapedTweets = PyQuery(json['items_html']) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyquery/pyquery.py", line 256, in init elements = fromstring(context, self.parser) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyquery/pyquery.py", line 100, in fromstring result = getattr(lxml.html, meth)(context) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/lxml/html/init.py", line 875, in fromstring doc = document_fromstring(html, parser=parser, base_url=base_url, **kw) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/lxml/html/init.py", line 763, in document_fromstring raise etree.ParserError( lxml.etree.ParserError: Document is empty

Document is empty

Done. Output file generated "output_got.csv".