Open SimonLammer opened 1 year ago
https://github.com/docker-library/python/issues/575 might have some useful ideas/info/discussion in it for you
Anything new about this?
For what it's worth, with_debug isn't known to have any performance impact, other than a bit larger binary size. There are several (non-Python) discussions about this, for example: https://stackoverflow.com/questions/8676466/how-do-debug-symbols-affect-performance-of-a-linux-executable-compiled-by-gcc
I'd guess the slowdown is likely due to container security overhead. You should try to run your tests with docker run --security-opt seccomp:unconfined
. See: https://stackoverflow.com/questions/60840320/docker-50-performance-hit-on-cpu-intensive-code
I'd guess the slowdown is likely due to container security overhead. You should try to run your tests with
docker run --security-opt seccomp:unconfined
. See: https://stackoverflow.com/questions/60840320/docker-50-performance-hit-on-cpu-intensive-code
Docker itself would add more overhead on top of that unless some security features are disabled (i.e. running the tests in docker with --privileged
yielded very similar results to "dockerbinary"; standard docker took about twice as long as that).
The tests for "dockerbinary" ran without docker - I copied the python version distributed via docker to my host machine and proceeded to execute the tests with that directly on my host; and still observed the ~11% performance overhead.
I see, I didn't catch the Docker Python binary was extracted, then tested. Although, I looked around a bit more and couldn't find any evidence that the debug symbols hurt performance. If you still have the test set up, it looks like you can strip a binary after it was compiled with strip --strip-debug
. Could be an easy way to test the theory. Otherwise, there might be something else going on.
Cool, I ran some quick (read: could be unreliable) benchmarks with Python 3.12.1 from the official Docker image and from Deadsnakes. I also ran a test with a stripped version of the Docker binary. These tests were run inside Docker, the official binary within the official container and Deadsnake binary in the latest Ubuntu container. All on my Mac M1 laptop.
I ran the float
test from pyperformance
on rigorous: pyperformance run -b float -r -o NAME.json
.
Results: Official Docker binary vs Deadsnake
+-----------+---------------------+-------------------+--------------+----------------------+
| Benchmark | pydocker_float.json | pydead_float.json | Change | Significance |
+===========+=====================+===================+==============+======================+
| float | 63.5 ms | 60.8 ms | 1.04x faster | Significant (t=9.11) |
+-----------+---------------------+-------------------+--------------+----------------------+
Official Docker binary vs same binary, but with strip --strip-all
applied:
+-----------+---------------------+------------------------------+--------------+----------------------+
| Benchmark | pydocker_float.json | pydocker_float_stripped.json | Change | Significance |
+===========+=====================+==============================+==============+======================+
| float | 63.5 ms | 61.2 ms | 1.04x faster | Significant (t=9.97) |
+-----------+---------------------+------------------------------+--------------+----------------------+
And finally, stripped official binary vs Deadsnake:
+-----------+------------------------------+-------------------+--------------+-----------------+
| Benchmark | pydocker_float_stripped.json | pydead_float.json | Change | Significance |
+===========+==============================+===================+==============+=================+
| float | 61.2 ms | 60.8 ms | 1.01x faster | Not significant |
+-----------+------------------------------+-------------------+--------------+-----------------+
Analysis: While I'm not seeing the 11% performance difference, there seems at least a 4% speedup when stripping the debug symbols. Stipped binary vs Deadsnake does not have a significant performance difference. I also tried the test on a few other benchmarks and the speedup seems consistent. I think these results need further investigation though. A full benchmark in a more consistent environment would be good.
The other open question is how would stripping these symbols affect usage? That's not clear to me, and we would need to weigh that vs the small performance bump. There seems to be other open tickets requesting more debug info, so I'm not sure if these symbols are doing anything at all?
Interesting. It looks like the python:slim
image variants are stripped:
root@985e385a5760:/app# file /usr/local/bin/python3.12
/usr/local/bin/python3.12: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=c421fbb49476f1727009a04fcaf0c49e6a81a615, for GNU/Linux 3.7.0, stripped
And indeed, the slim binaries are faster than non-slim:
+-----------+---------------------+-------------------------+--------------+-----------------------+
| Benchmark | pydocker_float.json | pydockerslim_float.json | Change | Significance |
+===========+=====================+=========================+==============+=======================+
| float | 63.5 ms | 60.6 ms | 1.05x faster | Significant (t=12.30) |
+-----------+---------------------+-------------------------+--------------+-----------------------+
Since people do use the slim package for optimizing file size, I think it makes sense to use it when you want to get a bit better performance at the cost of "debuggability". Maybe this performance difference should be documented somewhere, but I think the answer to this issue is just to use the slim images.
The tests for "dockerbinary" ran without docker - I copied the python version distributed via docker to my host machine and proceeded to execute the tests with that directly on my host; and still observed the ~11% performance overhead.
@SimonLammer you should really add this to the title and/or original post. otherwise the answer is essentially "duh". the trade off for docker is more security and specified environment at the cost of speed.
Note that these benchmarks are not using the slim
variant of the Official Docker image which has additional optimizations.
I've observed a roughly 11% performance overhead when using the python distribution shipped with the
python:3
image, compared to the python distribution installable throughppa:deadsnakes/ppa
: https://stackoverflow.com/a/76133102/2808520My guess is that the
with debug_info
compilation introduces this ~11% performance overhead.I'd appreciate if someone could tell me if my guess is correct.