Data4Democracy / discursive

Twitter topic search and indexing with Elasticsearch
21 stars 11 forks source link

Initial commit of Issue 7: Writing json file to s3 bucket #8

Closed ASRagab closed 7 years ago

ASRagab commented 7 years ago

Initial work in writing tweets indexed to Elasticsearch dumped to s3 bucket:

  1. Tweets being appended to tweet_list list attribute in addition to being dumped into ES as on_status is called
  2. When tweet limit reached in StreamListener list will be dumped to file on local file system
  3. Call to s3 is made to create or retrieve bucket to place folders
  4. File is then picked up and a key is generated based on timestamp to generate pseudo-folder structure in S3 - this structure can be used to download multiple files simultaneously
  5. File is written to s3