bwbaugh / infertweet

Infer information from Tweets. Useful for human-centered computing tasks, such as sentiment analysis, location prediction, authorship profiling and more!
http://infertweet.bwbaugh.com/
Other
10 stars 1 forks source link

Process sentiment classification queries asynchronously #26

Closed bwbaugh closed 11 years ago

bwbaugh commented 11 years ago

Currently I have some mixed thoughts on this. Ideally, the pages should render so quickly that you don't mind blocking the main IO loop of the web server for each request. However, I think it might still be worthwhile to experiment with handing off the request to the classifier (which takes the most time currently) asynchronously so that we can respond to other requests as well.

bwbaugh commented 11 years ago

Before implementing this, I wanted to perform some preliminary performance experiments.

I downloaded PyPy, and wow, what a difference. It loads the classify very quickly from the pickle-file, whereas regular Python takes several minutes. The motiviation for using an RPC server for the classifier was that it took a long time to load, which would make restarting the web server very costly, which has to be done often during development.

In addition, there is at least an order of magnitude increase in speed when the classifier is accessed locally rather than through the RPC server. This is to be expected, but the point is that in order to justify the lower speed the RPC server must provide some worthwhile benefit(s), otherwise we should just load the classifier locally.

To tie this all back into this issue, if the classification can be done quickly enough, then the pages can be rendered so quickly that we don't mind blocking the IO loop.

Therefore, I currently see no need to process the queries asynchronously. However, this may change should significant changes be made in the future. Perhaps we will need to handle long running queries, such as a request that needs to classify thousands of tweets, either because of the nature of the procedure or if we implement an API that can classify in bulk.

bwbaugh commented 11 years ago

With the addition of asking Twitter for tweets matching keywords, there is now reason to process sentiment requests asynchronously since waiting for the results from Twitter takes time.

bwbaugh commented 11 years ago

Since we aren't processing every sentiment classification request asynchronously, but instead only those from Twitter, I have decided to make a new issue for that purpose and close this one for the original reasons.