Closed bwbaugh closed 11 years ago
Before implementing this, I wanted to perform some preliminary performance experiments.
I downloaded PyPy, and wow, what a difference. It loads the classify very quickly from the pickle-file, whereas regular Python takes several minutes. The motiviation for using an RPC server for the classifier was that it took a long time to load, which would make restarting the web server very costly, which has to be done often during development.
In addition, there is at least an order of magnitude increase in speed when the classifier is accessed locally rather than through the RPC server. This is to be expected, but the point is that in order to justify the lower speed the RPC server must provide some worthwhile benefit(s), otherwise we should just load the classifier locally.
To tie this all back into this issue, if the classification can be done quickly enough, then the pages can be rendered so quickly that we don't mind blocking the IO loop.
Therefore, I currently see no need to process the queries asynchronously. However, this may change should significant changes be made in the future. Perhaps we will need to handle long running queries, such as a request that needs to classify thousands of tweets, either because of the nature of the procedure or if we implement an API that can classify in bulk.
With the addition of asking Twitter for tweets matching keywords, there is now reason to process sentiment requests asynchronously since waiting for the results from Twitter takes time.
Since we aren't processing every sentiment classification request asynchronously, but instead only those from Twitter, I have decided to make a new issue for that purpose and close this one for the original reasons.
Currently I have some mixed thoughts on this. Ideally, the pages should render so quickly that you don't mind blocking the main IO loop of the web server for each request. However, I think it might still be worthwhile to experiment with handing off the request to the classifier (which takes the most time currently) asynchronously so that we can respond to other requests as well.