Configuring nighthawk-client not to reusing session makes results 10 times worse (and utilization of w CPU/RAM assign to nighthawk server is very low less than 10% )

lzielezinska commented 2 years ago

Title: Results for nighthawk client configured not to reuse the session are 10 times less (and utilization of w CPU/RAM assign to nighthawk server is very low, less than 10% )

Description: Command used to run nighthawk: nighthawk_client -p http2 --max-requests-per-connection 1 --address-family v4 "https://<IP>:<PORT>" --concurrency auto --rps "1000" --connections "$10" --duration "60" --request-body-size "400" --transport-socket '{"name": "envoy.transport_sockets.tls", "typed_config": { "@type":"type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext","max_session_keys":"0"}}'

When the paramater --transport-socket is not passed, we have noticed 10 times better results and almost 100% utilization of CPU assigned to nighthawk_server. We would like to run nighthawk in a way that client is not reusing the session, parameter max_session_keys":"0" configured that.

Reproduction steps: Run nighthawk with parameter: nighthawk_client ... --transport-socket '{"name": "envoy.transport_sockets.tls", "typed_config": { "@type":"type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext","max_session_keys":"0"}}' ... force Nighthawk client not to reuse session, but makes result very low and Nighthawk_server almost not working

mum4k commented 2 years ago

Thank you for the bug report @lzielezinska. Nighthawk reuses significant portions of the Envoy code which includes code that establishes TLS sessions. It is possible that you have discovered a hot spot in the Envoy code.

To confirm this, would you be able to provide a few more details?

The full command line and nighthawk output (including all the counters) for the execution without the custom transport socket.
The full nighthawk output (including all the counters) for the execution with the custom transport socket.
Could you also indicate what this is running against (what is the server). Is it the Nighthawk test server or some other system? If it is another system - were you able to verify its CPU usage during the experiment with the custom transport socket?

We can look to see if there is anything obvious in the outputs and counters. Once we have this, we can try to reproduce this problem and try to profile the execution.

pawelrxyz commented 2 years ago

hello let me share some info.

1.

full command line (both cases).(https://github.com/envoyproxy/nighthawk/files/8802648/853_bug_nighthawk_client_run.txt)
nighthawk output (+counters) for the execution without the custom transport socket. 853_bug_nighthawk_run_without_ts.txt

2.

nighthawk output (+counters) for the execution with the custom transport socket. 853_bug_nighthawk_run_with_ts.txt

3.

We run it against Nighthawk-Server with base test-server.yaml as a config map (kubernetes) https://github.com/envoyproxy/nighthawk/blob/main/source/server/README.md
cpu usage without transport socket
cpu usage with transport socket

mum4k commented 2 years ago

Thank you for providing the details @pawelrxyz. The lowered CPU usage suggests some thread/lock contention. I will try to reproduce this problem and profile it to identify the contended path.

mum4k commented 2 years ago

I reran the experiment and collected CPU and contention profiles. I wasn't able to reproduce the problem and did not see any unexpected behavior when I executed the test.

nighthawk_output_standard.txt is an experiment that runs without the custom TLS socket, i.e. session resumption was enabled. Confirmed by counters:

    counters {
      name: "ssl.handshake"
      value: 886082
    }
    counters {
      name: "ssl.session_reused"
      value: 885932
    }

nighthawk_output_custom_socket.txt is an experiment that runs with the custom TLS socket, i.e. session resumption was disabled. The counter ssl.session_reused is missing, sessions weren't reused.

Each test took about 5 min of wall time. I observed the following:

NH client contentions went down from 10.68s to 1.37s.
NH client CPU usage went up from 61s to 220s.
NH test server contentions went down from 711.19s to 546.72s.
NH test server CPU usage went up from 20.18s to 77.11s.

Since contentions went down, I no longer suspect lock contention. The increased CPU usage in the NH test server seems to be directly related to the need to negotiate TLS sessions which is expected. See the diff below which compares the two runs: nh_test_server_cpu_diff

The increased CPU usage in the NH client seems to be related to more calls to sched_yield() which I suspect is due to threads waiting for TLS session negotiation on the server side. When that is filtered out, the remaining increased CPU usage is also related to negotiation of TLS sessions. See the diff below which compares the two runs: nh_client_cpu_diff

Since I wasn't able to reproduce this, I could use your help here. For one, you could review this experiment and point out if I diverged from yours anywhere. Even better, if you can still reproduce this, could you provide both CPU and contention profiles from your run, so that we can see what the NH client and test server are doing while you are observing the decreased CPU usage.

envoyproxy / nighthawk

Configuring nighthawk-client not to reusing session makes results 10 times worse (and utilization of w CPU/RAM assign to nighthawk server is very low less than 10% ) #853