MichaelCurrin / twitterverse

Store and report on Twitter conversations, from tweets to trending topics 🌍 🐦 🐍
https://michaelcurrin.github.io/twitterverse/
MIT License
13 stars 1 forks source link

Rewrite simpler as new repos #94

Open MichaelCurrin opened 4 years ago

MichaelCurrin commented 4 years ago

This might be a long term thing as I should focus on getting use out of this before major refactor or redesign.

The current repo has over 100 files and structure and purpose changed over time.

Small refactors can make things break and take a while to untangle. I don't need to replicate all functionality.

Simple is easier to build and add to and fix and have confidence in.

New repo should do only trends or tweets. Use mysql or mongo. Work natively in the cloud and have way to run lambda on schedule to never miss data and to scale. Rerun failures later in the day though. Anything that does streaming or follower scraping can be stand alone repo. Perhaps plug together with tweet repo. Even if it is dealing with input and output if data files and not actually sharing code of one in the other. Old twitterverse can always hang around with that code. No rush to move it over to own repo.

Move to config approach. Weigh up. Placejob history may not be needed at all.

Insert into DB. Maybe use transaction or bulk insert for speed. Maybe use queue although not necessary if it's a single lambda script.

Use python interpreter or sql to view or edit records where possible to avoid util scripts but maybe scripts are worth it.

Note that while lambda and mysql can handle multiple searches and inserts at the same time there is still an API limit. So cron sequence or queue might be fine.

Also see how to log or store S3 errors on not inserted