Open tleb opened 3 months ago
I decided to investigate on #304. I know this may be controversial, but a) major culprits seem to be in things that didn't change much, b) since that revision is split into more funcitons, the flamegraph contains more interesting information. It's also easier to choose what to measure manually since code is refactored into more functions.
Workstation: 16GB DDR4 (~50% free), NVMe, i5-8250U (most cores were idle).
I used two methods:
Elixir modification that uses cProfile for profiling can be seen here.
Note that /usr/local/elixir/profiles/ directory needs to be writeable by www-data for this to work.
Flamegraphs can be created from prof files using flameprof and FlameGraph.
Example: flameprof --format=log 30931.prof | perl ~/apps/src/FlameGraph/flamegraph.pl > 30931.svg
.
I created three flamegraphs, one for the biggest identifier, one for a relatively big source file and one for a big directory tree: https://fbstc.org/scrap/uboot-u32-ident.svg https://fbstc.org/scrap/uboot-lib-kconfig.svg https://fbstc.org/scrap/uboot-include-tree.svg
In all three, a lot happens before web.py - I think these are imports. In the source framegraph, guess_lexer_for_filename takes the majority of time spent in web.py. Of course these are just three random flamegraph samples - not enough information to make a fully informed decision. Still, some patterns can be seen. I guess one could ask whether the flamegraphs would be similar on production.
I wrote my own simple profiler that logs execution time of selected blocks and functions. Statistical methodology is probably dubious (results are averages of percentages of total request time), but I mainly needed this as a sanity check for the flamegraphs (some information in there was suspicious).
Elixir version that uses this profiler can be seen here.
Results from a short, pseudo-random ("clicking around") request sample can be seen here
Results are divided into categories based on route (and file type in case of the source
route).
See calls to measure_function
and measure_block
in web.py for meaning of names of the measure points.
Numbers represent the average percentage of total script execution time (measured from the first import to the end of handle_request) over all requests from that category.
The report was generated using analyze_simple_profiler_logs.py
We could of course dig deeper and make more accurate measurements, but I think the conclusions are clear and actionable:
Ah, so even the guess_lexer_for_filename
that looks like slow code is actually slow imports as well. Let's wait for the switch to WSGI then we'll do a second round of profiling. Cool insight you found here, thanks!
Thanks for all the investigation. What is holding us from switching over to WSGI ?
What is holding us from switching over to WSGI ?
Elixir still uses global state and we need to get rid of that first. Most, if not all, should be gone with recent web.py refactorings.
Amazing! Then I'd say keep up the good work in this direction, so that we can move to WSGI in the near future.
Issue for tracking
http/web.py
performance work.On the official instance, a directory load (without caching) is about 500ms. A small file load (without caching) is about 750ms. That is a lot of time spent for only reading data. I expect low-hanging fruits.
http/web.py
to see where time is spent.