I am almost done, my query #devcon3 has resulted in > 11000 tweets in the past ~6 days; and it has downloaded > 900 images, most of which were not duplicates (still, I had to remove >50 identical images manually with xnview, and I don't know how they could enter; is the not item.has_key('retweeted_status') sometims simply incorrect?).
The query was this:
python -u -m TwitterGeoPics.SearchOldTweets -words '#devcon3' -stalk \
-no_images_of_retweets -photo_dir ./photos/hashtag_devcon3 \
-oauth twitterapi-oauth.txt | tee -a photos/hashtag-devcon3_tweets.txt
and I ended it manually when it had reached the 30th of October.
So ... I am done. I might not immediately benefit from any improvements. Perhaps one day I come back, and use your tool again. Anyways ... during the use I have had some ideas for possible extensions ...
ideas / suggestions / feature requests:
do not re-download images which are already here. Easy: Check whether that image file already exists and has size>0
do not re-download tweets. That would need a DB of already downloaded tweet IDs. But that then would allow to restart the script with the same parameters, and continue where it left off, or fill the blanks where there were connections errors and thus the tweet got skipped.
-time_from 20171030-130000 -time_to 20171105-235959 switches for limiting the datetime range
multi-threading, with clever delay not to overstretch the hammering rate that Twitter allows. Perhaps this helps.
(with threading this becomes possible) retries = if a download failed, sleep 10 seconds, then retry. Max 2 more times, only then give up.
generate a html file/table/div with all tweets, and linked/embedded images (I have already made the log output parse-able with the USER TWEET: DATE: GEOCODE: IMAGE: line beginnings, so that HTML file could also be done by a 2nd tool actually).
find out why there were still ~50 duplicate images among the 950. (I have unfortunately now deleted them, just redownload with the same command above, and you'll get duplicates)
some tweets are getting truncated.
what about video links?
I only got .JPG files; aren't there any PNG / GIF ?
I am almost done, my query #devcon3 has resulted in > 11000 tweets in the past ~6 days; and it has downloaded > 900 images, most of which were not duplicates (still, I had to remove >50 identical images manually with xnview, and I don't know how they could enter; is the
not item.has_key('retweeted_status')
sometims simply incorrect?).The query was this:
and I ended it manually when it had reached the 30th of October.
So ... I am done. I might not immediately benefit from any improvements. Perhaps one day I come back, and use your tool again. Anyways ... during the use I have had some ideas for possible extensions ...
ideas / suggestions / feature requests:
-time_from 20171030-130000 -time_to 20171105-235959
switches for limiting the datetime range