containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.77k stars 2.42k forks source link

libpod profiling and optimization. #1691

Closed kunalkushwaha closed 6 years ago

kunalkushwaha commented 6 years ago

Description

I understand, there are already work going on for integration and performance optimization as discussed in https://github.com/containers/libpod/issues/1656#issuecomment-431185424 & https://github.com/containers/libpod/issues/1333#issuecomment-415687606.

To understand to consume libpod as library and performance profiling, I wrote a small program https://gist.github.com/kunalkushwaha/4504487b0b43dbbe1421069cd50eb08c#file-perftest-go, which consume libpod as library and runs containers( create, start, stop, delete) in loop. Also it measures the time of each operation in terms of min, max and average and outputs pprof CPU profile data to analyse.

sample output

Container Profile data

        Create  Start   Stop    Delete
Min     0.23s   0.37s   10.12s  0.00s
Avg     0.28s   0.45s   10.33s  0.00s
Max     0.50s   0.75s   10.13s  0.00s
Total   13.89s  22.42s  516.56s 0.00s
2018/10/22 16:50:46 profile: cpu profiling disabled, /tmp/profile419673667/cpu.pprof

There are one many places where optimization cab be done, but I found encoding/json library which adds visible CPU time to create & start operations. Check https://gist.github.com/kunalkushwaha/4504487b0b43dbbe1421069cd50eb08c#file-podman-profile-svg.

I think replacing encoding/json with library like "github.com/json-iterator/go" (faster by 4x-5x), which is already used by various container projects and have same interfaces as encoding/json package could be helpful.

Please let me know your views on this.

Steps to reproduce the issue:

1.

2.

3.

Describe the results you received:

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

(paste your output here)

Output of podman info:

(paste your output here)

Additional environment details (AWS, VirtualBox, physical, etc.):

mheon commented 6 years ago

Alright, I replied via email, but Github is being wierd today, so I'll respond here as well.

I'm surprised you're seeing JSON as a perf issue - our own profiling suggests it's not a big factor. We're already using a high-speed JSON encode/decode library (Easyjson), so I doubt we'll get much benefit moving to another third-party library.

We're seeing most time being spent during start as being in the chrootuser library, and waiting for Conmon to start (most of our runtime is spent there)

baude commented 6 years ago

I think it depends on which kind of profile is run. I see JSON in the CPU traces for sure

baude commented 6 years ago

Also, I think that some of the json decoding is a red herring because we call out to conmon from "within" a json decode call.

giuseppe commented 6 years ago

how much is spent in the OCI runtime? I think creating a network namespace is an expensive operation, some time ago I've done some profiling here, do you notice significant difference if you use --net host?

mheon commented 6 years ago

@giuseppe I think per @baude we spend somewhere between 25% and 40% of Start() in Conmon and the OCI runtime.

We also noticed that network namespace creation is expensive. @baude is also working to try and alleviate this by doing it in parallel with other setup, but I think that's going to need more debate before merging that

mheon commented 6 years ago

We're using a high-speed JSON library already (Easyjson) so I'm surprised that you're seeing that much time in encode/decode - most of our profiles are showing that create spends a lot of time in the chrootuser code and waiting for Conmon to start.

On Mon, Oct 22, 2018, 05:12 Kunal Kushwaha notifications@github.com wrote:

Description

I understand, there are already work going on for integration and performance optimization as discussed in #1656 (comment) https://github.com/containers/libpod/issues/1656#issuecomment-431185424 & #1333 (comment) https://github.com/containers/libpod/issues/1333#issuecomment-415687606.

To understand to consume libpod as library and performance profiling, I wrote a small program https://gist.github.com/kunalkushwaha/4504487b0b43dbbe1421069cd50eb08c#file-perftest-go, which consume libpod as library and runs containers( create, start, stop, delete) in loop. Also it measures the time of each operation in terms of min, max and average and outputs pprof CPU profile data to analyse.

sample output

Container Profile data

    Create  Start   Stop    Delete

Min 0.23s 0.37s 10.12s 0.00s Avg 0.28s 0.45s 10.33s 0.00s Max 0.50s 0.75s 10.13s 0.00s Total 13.89s 22.42s 516.56s 0.00s 2018/10/22 16:50:46 profile: cpu profiling disabled, /tmp/profile419673667/cpu.pprof

There are one many places where optimization cab be done, but I found encoding/json library which adds visible CPU time to create & start operations. Check https://gist.github.com/kunalkushwaha/4504487b0b43dbbe1421069cd50eb08c#file-podman-profile-svg .

I think replacing encoding/json with library like " github.com/json-iterator/go" (faster by 4x-5x), which is already used by various container projects and have same interfaces as encoding/json package could be helpful.

Please let me know your views on this.

Steps to reproduce the issue:

1. 2. 3.

Describe the results you received:

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

(paste your output here)

Output of podman info:

(paste your output here)

Additional environment details (AWS, VirtualBox, physical, etc.):

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/containers/libpod/issues/1691, or mute the thread https://github.com/notifications/unsubscribe-auth/AHYHCJwy3Bz76khTkF-ssHhHZ-mrlwTYks5unYvkgaJpZM4XzBim .

mheon commented 6 years ago

Hm. The json-iterator package doesn't seem to require codegen. That's a significant advantage over easyjson (it's slightly faster, too). This is worth looking into even if it's not a big perf improvement. Thanks @kunalkushwaha

kunalkushwaha commented 6 years ago

Due to github issue, I guess my files were not visible. The graph is visible now. CPU profile graph

I have not created network in my tests, So network part is not profiled.

The main CPU consuming part is storage layer both in Create & Start.

Though I agree, It wont improve the performance drastically, but will help in overall improvement as there will be more limitations in improving storage and network performance.

Will share more, once I dig it more.

mheon commented 6 years ago

c/storage is using FFJSON throughout AFAIK. There may be issues moving from FFJSON to alternative JSON encoders (it isn't 100% compatible with the builtin encoding/json library and anything meant to be compatible with said library)

baude commented 6 years ago

@kunalkushwaha I'll take a look at this tomorrow but I think the JSON showing up in the mountstorage is a red herring because the marshalling is wrapped in a response back from conmon.

baude commented 6 years ago

mind checking your profiles with master now ?

kunalkushwaha commented 6 years ago

Looks much better now :+1:

runc version spec: 1.0.1-dev
2018/10/29 10:07:39 profile: cpu profiling enabled, /tmp/profile683655560/cpu.pprof
Test Round: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
Container Profile data

        Create  Start   Stop    Delete
Min     0.23s   0.32s   2.12s   0.47s
Avg     0.24s   0.38s   2.17s   0.53s
Max     0.28s   1.14s   2.15s   0.75s
Total   12.18s  18.78s  108.55s 26.41s
2018/10/29 10:10:25 profile: cpu profiling disabled, /tmp/profile683655560/cpu.pprof

also CPU load looks better (not spending much time on json etc.) screenshot_2018-10-29 perftest cpu

vrothberg commented 6 years ago

Can we close this issue?

kunalkushwaha commented 6 years ago

@vrothberg have create a PR for adding a contrib tool perftest, which I used for profiling. I think perftest could be helpful for profiling and stress-test. PTAL https://github.com/containers/libpod/pull/1757

kunalkushwaha commented 6 years ago

Closing this, json parsing is no-more the real issue.