`http/web.py` performance issues

tleb commented 3 months ago

Issue for tracking http/web.py performance work.

On the official instance, a directory load (without caching) is about 500ms. A small file load (without caching) is about 750ms. That is a lot of time spent for only reading data. I expect low-hanging fruits.

See if these sort of timescales are also seen in local environment.
Profile http/web.py to see where time is spent.
Make decisions based on that.

fstachura commented 3 months ago

I decided to investigate on #304. I know this may be controversial, but a) major culprits seem to be in things that didn't change much, b) since that revision is split into more funcitons, the flamegraph contains more interesting information. It's also easier to choose what to measure manually since code is refactored into more functions.

Workstation: 16GB DDR4 (~50% free), NVMe, i5-8250U (most cores were idle).

I used two methods:

CProfile and flamegraphs

Elixir modification that uses cProfile for profiling can be seen here.

Note that /usr/local/elixir/profiles/ directory needs to be writeable by www-data for this to work.

Flamegraphs can be created from prof files using flameprof and FlameGraph. Example: flameprof --format=log 30931.prof | perl ~/apps/src/FlameGraph/flamegraph.pl > 30931.svg.

I created three flamegraphs, one for the biggest identifier, one for a relatively big source file and one for a big directory tree: https://fbstc.org/scrap/uboot-u32-ident.svg https://fbstc.org/scrap/uboot-lib-kconfig.svg https://fbstc.org/scrap/uboot-include-tree.svg

In all three, a lot happens before web.py - I think these are imports. In the source framegraph, guess_lexer_for_filename takes the majority of time spent in web.py. Of course these are just three random flamegraph samples - not enough information to make a fully informed decision. Still, some patterns can be seen. I guess one could ask whether the flamegraphs would be similar on production.

My own profiler

I wrote my own simple profiler that logs execution time of selected blocks and functions. Statistical methodology is probably dubious (results are averages of percentages of total request time), but I mainly needed this as a sanity check for the flamegraphs (some information in there was suspicious).

Elixir version that uses this profiler can be seen here.

Results from a short, pseudo-random ("clicking around") request sample can be seen here

Results are divided into categories based on route (and file type in case of the source route). See calls to measure_function and measure_block in web.py for meaning of names of the measure points. Numbers represent the average percentage of total script execution time (measured from the first import to the end of handle_request) over all requests from that category. The report was generated using analyze_simple_profiler_logs.py

Conclusions

We could of course dig deeper and make more accurate measurements, but I think the conclusions are clear and actionable:

CGI forces us to redo all imports for each request. This is not only redundant, imports take a significant execution time. AFAIK this shouldn't be a problem with WSGI/UWSGI/ASGI because the architecture is different - the application is initialized once, and then it just listens for requests. Requests are processed by a long-running process.
guess_lexer_for_filename takes more time than code formatting itself. It seems that it's because Pygments imports some lexers that didn't match the filename, until it finds the matching lexer. Maybe this could be reported and fixed, if we investigate a bit more. But again, imports are cached, so this shouldn't be a problem if we move to WSGI.

tleb commented 3 months ago

Ah, so even the guess_lexer_for_filename that looks like slow code is actually slow imports as well. Let's wait for the switch to WSGI then we'll do a second round of profiling. Cool insight you found here, thanks!

tpetazzoni commented 3 months ago

Thanks for all the investigation. What is holding us from switching over to WSGI ?

fstachura commented 3 months ago

What is holding us from switching over to WSGI ?

Elixir still uses global state and we need to get rid of that first. Most, if not all, should be gone with recent web.py refactorings.

tpetazzoni commented 3 months ago

Amazing! Then I'd say keep up the good work in this direction, so that we can move to WSGI in the near future.

bootlin / elixir