Data4Democracy / discursive

Twitter topic search and indexing with Elasticsearch
21 stars 11 forks source link

Poll User Data daily #30

Open tdraebing opened 7 years ago

tdraebing commented 7 years ago

Implementing a daily poll of twitter user data as described in issue #16.

Running the docker instance now will next indexing tweets with specified topics also fetch user data once a day. The timing is achieved by extending the original crontab-script. The index_user_profiles.py-script takes a file listing the user-IDs of the user data to be pulled separated by line breaks. It uses the lookup_users() method of tweepy to fetch the data from Twitter's REST API and hands it to ElasticSearch for indexing.

The following features are extracted from the full set of user features provided by the twitter API:

If you have further suggestions or found bugs, I would be happy to deal with those as well.

Cheers, Thomas

hadoopjax commented 7 years ago

Thanks @tdraebing ! Let me pull this down and play with it over the next couple days but thanks a ton for the work! I'll get back with a review no later than Thursday.