Call-for-Code-for-Racial-Justice / Five-Fifths-Voter

Five Fifths Voter is a web application tool designed to enable and empower Black people and others to exercise their right to vote by ensuring their voice is heard
Apache License 2.0
66 stars 38 forks source link

Twitter code needs some performance improvements #31

Closed davidnixon closed 4 years ago

davidnixon commented 4 years ago

Clicking the button to see tweets works but results are delayed by about 30 seconds in my local testing. I did a very, very light analysis and it looks like the NLU and the Tone Analyzer both take a bit more than ½ second each and with 25 tweets that adds up to about the 30 seconds.

Possible solutions:

Maybe the Watson docs have guidelines on performance?

@drealuc Do you have an opinion?

drealuc commented 4 years ago

Marek will provide recommendation to resolve performance issue.

blumareks commented 4 years ago

@drealuc @davidnixon my suggestion would be to use a fanning out pattern (that might be implemented with threads) and fanning in to gather the results. How I have done it?

The details of my implementation with HackerNews are here for you to check them out: https://github.com/serverless-swift/ch6-app with the video of the implementation: https://github.com/serverless-swift/ch6-app The chapter 6 that talks about it in my :-) book is here (you might have an access to it via O'Reilly): https://learning.oreilly.com/library/view/serverless-swift-apache/9781484258361/ And finally the video showing me implementing these steps is here: step 1: https://youtu.be/0G3ji8RouKA step 2: https://youtu.be/FYolLFvIsSc We can have an additional call to go over it - or even I could help in adjusting my serverless backend to our needs if needed.

blumareks commented 4 years ago

@Shreyanand Shrey please have a look ^^ at the above explanations.

Shreyanand commented 4 years ago

@blumareks Thanks a lot for the detailed explanation. I went through the resources and if IIUC, there needs to be a database where the tweets are fetched into, a trigger that calls Watson NLU, and a fanning out process to collect the results for all the tweets in another database. In addition to the parallel processing for tweets, if Watson and the tweet database are on the same cloud server it saves even more time...

While it seems really interesting, and I'd love to talk to you about it to understand this more, for the immediate goal I think this would be a little difficult to implement.

Having said that, your multi-threading comment was a winner. I'm not sure if I interpreted it right, but I realized that there isn't any CPU task here and it's just IO calls to the API. So, I just gave naive multi-threading a try and it got the time from ~38s to ~2s.

Screenshot from 2020-10-07 16-30-17

Although it works, I want to confirm from you if this is acceptable and that It would not result in any other problem...

drealuc commented 4 years ago

Ready for unit testing and merge

sydrosa commented 4 years ago

This was merged into master -- I will close now.