docker-library / python

Docker Official Image packaging for Python
https://www.python.org/
MIT License
2.53k stars 1.06k forks source link

Why is python shipped with the `python` docker image slower than that of my local machine? #825

Open SimonLammer opened 1 year ago

SimonLammer commented 1 year ago

I've observed a roughly 11% performance overhead when using the python distribution shipped with the python:3 image, compared to the python distribution installable through ppa:deadsnakes/ppa: https://stackoverflow.com/a/76133102/2808520

local dockerbinary
avg 0.79917586 0.89829016
std 0.02433539 0.03554546
min 0.78087375 0.86344007
q1 0.78211388 0.86950620
q2 0.79006154 0.88853465
q3 0.80732969 0.91612282
max 0.89824817 0.99477790
$ file `which python3.10`
/usr/bin/python3.10: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=fb3f4369481251e6ba441382fd6d9ab47af0db29, for GNU/Linux 3.2.0, stripped
$ file docker-python/local/bin/python3.10
docker-python/local/bin/python3.10: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=618b23f947f202224f4ea8e16375ac7bcad13c4f, for GNU/Linux 3.2.0, with debug_info, not stripped

My guess is that the with debug_info compilation introduces this ~11% performance overhead.

I'd appreciate if someone could tell me if my guess is correct.

tianon commented 1 year ago

https://github.com/docker-library/python/issues/575 might have some useful ideas/info/discussion in it for you

sowinski commented 9 months ago

Anything new about this?

blopker commented 8 months ago

For what it's worth, with_debug isn't known to have any performance impact, other than a bit larger binary size. There are several (non-Python) discussions about this, for example: https://stackoverflow.com/questions/8676466/how-do-debug-symbols-affect-performance-of-a-linux-executable-compiled-by-gcc

I'd guess the slowdown is likely due to container security overhead. You should try to run your tests with docker run --security-opt seccomp:unconfined. See: https://stackoverflow.com/questions/60840320/docker-50-performance-hit-on-cpu-intensive-code

SimonLammer commented 8 months ago

I'd guess the slowdown is likely due to container security overhead. You should try to run your tests with docker run --security-opt seccomp:unconfined. See: https://stackoverflow.com/questions/60840320/docker-50-performance-hit-on-cpu-intensive-code

Docker itself would add more overhead on top of that unless some security features are disabled (i.e. running the tests in docker with --privileged yielded very similar results to "dockerbinary"; standard docker took about twice as long as that). The tests for "dockerbinary" ran without docker - I copied the python version distributed via docker to my host machine and proceeded to execute the tests with that directly on my host; and still observed the ~11% performance overhead.

blopker commented 8 months ago

I see, I didn't catch the Docker Python binary was extracted, then tested. Although, I looked around a bit more and couldn't find any evidence that the debug symbols hurt performance. If you still have the test set up, it looks like you can strip a binary after it was compiled with strip --strip-debug. Could be an easy way to test the theory. Otherwise, there might be something else going on.

blopker commented 8 months ago

Cool, I ran some quick (read: could be unreliable) benchmarks with Python 3.12.1 from the official Docker image and from Deadsnakes. I also ran a test with a stripped version of the Docker binary. These tests were run inside Docker, the official binary within the official container and Deadsnake binary in the latest Ubuntu container. All on my Mac M1 laptop.

I ran the float test from pyperformance on rigorous: pyperformance run -b float -r -o NAME.json.

Results: Official Docker binary vs Deadsnake

+-----------+---------------------+-------------------+--------------+----------------------+
| Benchmark | pydocker_float.json | pydead_float.json | Change       | Significance         |
+===========+=====================+===================+==============+======================+
| float     | 63.5 ms             | 60.8 ms           | 1.04x faster | Significant (t=9.11) |
+-----------+---------------------+-------------------+--------------+----------------------+

Official Docker binary vs same binary, but with strip --strip-all applied:

+-----------+---------------------+------------------------------+--------------+----------------------+
| Benchmark | pydocker_float.json | pydocker_float_stripped.json | Change       | Significance         |
+===========+=====================+==============================+==============+======================+
| float     | 63.5 ms             | 61.2 ms                      | 1.04x faster | Significant (t=9.97) |
+-----------+---------------------+------------------------------+--------------+----------------------+

And finally, stripped official binary vs Deadsnake:

+-----------+------------------------------+-------------------+--------------+-----------------+
| Benchmark | pydocker_float_stripped.json | pydead_float.json | Change       | Significance    |
+===========+==============================+===================+==============+=================+
| float     | 61.2 ms                      | 60.8 ms           | 1.01x faster | Not significant |
+-----------+------------------------------+-------------------+--------------+-----------------+

Analysis: While I'm not seeing the 11% performance difference, there seems at least a 4% speedup when stripping the debug symbols. Stipped binary vs Deadsnake does not have a significant performance difference. I also tried the test on a few other benchmarks and the speedup seems consistent. I think these results need further investigation though. A full benchmark in a more consistent environment would be good.

The other open question is how would stripping these symbols affect usage? That's not clear to me, and we would need to weigh that vs the small performance bump. There seems to be other open tickets requesting more debug info, so I'm not sure if these symbols are doing anything at all?

blopker commented 8 months ago

Interesting. It looks like the python:slim image variants are stripped:

root@985e385a5760:/app# file /usr/local/bin/python3.12
/usr/local/bin/python3.12: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=c421fbb49476f1727009a04fcaf0c49e6a81a615, for GNU/Linux 3.7.0, stripped

And indeed, the slim binaries are faster than non-slim:

+-----------+---------------------+-------------------------+--------------+-----------------------+
| Benchmark | pydocker_float.json | pydockerslim_float.json | Change       | Significance          |
+===========+=====================+=========================+==============+=======================+
| float     | 63.5 ms             | 60.6 ms                 | 1.05x faster | Significant (t=12.30) |
+-----------+---------------------+-------------------------+--------------+-----------------------+

Since people do use the slim package for optimizing file size, I think it makes sense to use it when you want to get a bit better performance at the cost of "debuggability". Maybe this performance difference should be documented somewhere, but I think the answer to this issue is just to use the slim images.

wbolster commented 1 week ago

see also https://discuss.python.org/t/why-python-in-debian-docker-image-is-faster-than-official-dockers-python-image/53976/11

3052 commented 1 week ago

The tests for "dockerbinary" ran without docker - I copied the python version distributed via docker to my host machine and proceeded to execute the tests with that directly on my host; and still observed the ~11% performance overhead.

@SimonLammer you should really add this to the title and/or original post. otherwise the answer is essentially "duh". the trade off for docker is more security and specified environment at the cost of speed.

blopker commented 1 week ago

see also https://discuss.python.org/t/why-python-in-debian-docker-image-is-faster-than-official-dockers-python-image/53976/11

Note that these benchmarks are not using the slim variant of the Official Docker image which has additional optimizations.