gazetteerhk / census_explorer

Explore Hong Kong's neighborhoods through visualizations of census data
http://gazetteer.hk
MIT License
42 stars 12 forks source link

pandas API optimization #41

Open hupili opened 10 years ago

hupili commented 10 years ago

redirected from #11 and emails.

Some optimization. Although pandas is good at in memory computation, some queries are still slow, near 1 second. It could be a problem with more users. Since most use cases will filter down to one "table". A precomptued result along this dimension will drastically reduce computation, in order of 100th. This heuristic alone should be good enough but we had better do a better profliing before that (e.g. enumerate the parameters to get a better picture of the bottleneck query types).

hupili commented 10 years ago

@hxu I see you have some test suite in FE to enumerate some API parameters? I don't know how to use it. Is it easy to turn it into a benchmarking tool?

We can first know the avg. time to enumerate the test suite. After optimization, we'll have concrete idea how much is improved.

If not easy in FE, I'll make benchmarking tools in BE, but probably a bit later.

hxu commented 10 years ago

Actually the front end tests mock out the back end, so they never hit the server. You should probably use python tests for this. On Feb 21, 2014 8:20 PM, "HU, Pili" notifications@github.com wrote:

@hxu https://github.com/hxu I see you have some test suite in FE to enumerate some API parameters? I don't know how to use it. Is it easy to turn it into a benchmarking tool?

We can first know the avg. time to enumerate the test suite. After optimization, we'll have concrete idea how much is improved.

If not easy in FE, I'll make benchmarking tools in BE, but probably a bit later.

— Reply to this email directly or view it on GitHubhttps://github.com/hxu/hk_census_explorer/issues/41#issuecomment-35794269 .

hxu commented 10 years ago

Though I suppose you could use it to try to hit the database. Let me see if I can whip something up.

hxu commented 10 years ago

@hupili check out branch benchmark, start your local dev server with grunt serve, then go to localhost:9000/#/benchmark.

Clicking on Run Benchmarks will send all of the requests at once, then wait for them to return. Sometimes it can take up to 20 seconds before the first request returns.

Clicking on Run in sequence will send one request, then wait for it to return before sending the next one.

You can already see some patterns in the behavior of the API from these tests. It seems like the first request "warms up" the server, with the first request taking maybe 7-6 seconds, then subsequent requests taking about a second each.

Also, related queries are fast if done one after the other. For example, querying for age, then ageMale, then ageFemale, will show that the last two queries a third of the time as the first request.

hxu commented 10 years ago

By the way, it turned out that it was not possible to hit the backend in the testing suite because angular-mocks mocks out the normal $httpBackend, and you need angular-mocks to run the tests.