Web api response is occasionally delayed

withlin commented 5 years ago

In our production environment, our webapi which it is several delays in an hour, such as: 6s 8s, even 11s(only read redis and then resonse data). So I try to see the load on the server

1.some Env:

System: ubuntu16.04

root@gxc01:~# docker version
Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        6247962
 Built:             Sun Feb 10 04:13:50 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:42:13 2019
  OS/Arch:          linux/amd64
  Experimental:     false

root@16d66511c5b2:/app# dotnet --info

Host (useful for support):
  Version: 2.1.2
  Commit:  811c3ce6c0

.NET Core SDKs installed:
  No SDKs were found.

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.1.2 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.1.2 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 2.1.2 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs:
  https://aka.ms/dotnet-download

use top （system :8 cores 18g）

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND  
27824 root      20   0 24.343g 949104  55548 S  89.4  5.1   2514:57 dotnet                                                                                                                                   
17234 root      20   0 24.304g 829372  57604 S  10.6  4.5 437:41.96 dotnet                                                                                                                                   
10998 root      20   0 7885540 163428  45708 S   4.0  0.9 696:40.62 dotnet                                                                                                                                   
48450 root      20   0 7054140 123368  40500 S   3.6  0.7 167:18.73 dotnet                                                                                                                                   
26369 root      20   0 24.318g 795744  46608 S   3.3  4.3 132:03.66 dotnet                                                                                                                                   
37036 root      20   0 7194564 154824  43056 S   3.0  0.8 134:10.61 dotnet                                                                                                                                   
31036 root      20   0 6179088 152092  41072 S   2.6  0.8   3148:15 dotnet                                                                                                                                   
64992 root      20   0 6017452 137052  46828 S   2.3  0.7 136:33.42 dotnet                                                                                                                                   
2838 root      20   0 6715628  62676  21328 S   2.0  0.3   3116:59 dotnet                                                                                                                                   
8027 root      20   0 3941816  95132  35008 S   2.0  0.5 302:06.24 dotnet                                                                                                                                   
31557 root      20   0 6555328 128968  35876 S   2.0  0.7 658:25.17 dotnet                                                                                                                                   
27708 root      20   0 24.198g 252272  18728 S   1.3  1.4   1228:59 dotnet                                                                                                                                   
5301 root      20   0 6005620 124368  47896 S   1.0  0.7  32:54.30 dotnet                                                                                                                                   
28990 root      20   0 5992000 150784  47520 S   1.0  0.8  25:03.81 dotnet                                                                                                                                   
41201 root      20   0 6865976 159164  41460 S   1.0  0.9  68:21.38 dotnet                                                                                                                                   
58925 root      20   0 6661796 133560  43596 S   1.0  0.7  70:11.35 dotnet                                                                                                                                   
60405 root      20   0 6747872  95872  38564 S   1.0  0.5 818:22.40 dotnet                                                                                                                                   
7 root      20   0       0      0      0 S   0.7  0.0   1145:53 rcu_sched                                                                                                                                
2612 root      20   0 3943920  59944  15284 S   0.7  0.3 203:07.37 dotnet                                                                                                                                   
3107 root      20   0 24.221g 252092  18028 S   0.7  1.4 395:40.46 dotnet                                                                                                                                   
4187 root      20   0 7647892  74116   8320 S   0.7  0.4   2089:55 java                                                                                                                                     
19037 root      20   0 5932076 114500  38568 S   0.7  0.6  86:25.56 dotnet                                                                                                                                   
19546 root      20   0 3923700  83596  24948 S   0.7  0.5 206:54.95 dotnet                                                                                                                                   
21192 root      20   0 4736528 125612  42540 S   0.7  0.7  59:52.50 dotnet

The pid of 27824,which it is hight cpu(89.4%). sometimes it will rise to 150%。 The average CPU is 89.4%

use perf tool view it:

use lldb view it:

so i have some assumption: 1.Whether the program hangs because of garbage collection, and then waits for the garbage to be collected before responding.（such as: 6s 8s, even 11s(only read redis and then resonse data).） 2.The current sdk version is still very low( 2.1.2.), I don't know if it is a problem with the sdk version?

RussKeldorph commented 5 years ago

@maoni0 @jkotas @noahfalk Links to advice?

Maoni0 commented 5 years ago

the stack you are showing in lldb doesn't make a whole of sense (eg, gc_heap::background_written_addresses is just an array). would it be possible to fix the symbols?

but assuming the 4 frames you highlighted are showing correct symbols, this is not the same thread that's shown in perf. in perf you are showing a threadpool thread which cannot be a server GC thread which is what the lldb stack is showing. also the stack in lldb most likely just says there's no GC's happening so that GC thread is blocked.

in general we recommend folks to collect CLR events with this kind of investigation. looping in @billwert. Bill, would you have time to help with this investigation?

withlin commented 5 years ago

Hi @Maoni0,thank you for your reply,it is my first time using lldb.I am not very familiar with its usage, I try to dump the stack again. Have you any tools view hight cpu problem in linux?

billwert commented 5 years ago

Hi @withlin!

We have docs for Linux performance tracing here. If you take a trace with perfcollect and share it somewhere, we can take a look at what's going on.

withlin commented 5 years ago

hi @billwert

i have fllow the doc,which is not tracing. excuete log :

root@zd:~# docker exec -it --privileged  c50032b01447  /bin/bash

root@c50032b01447:/tmp# ./perfcollect collect sampleTrace
Collection started. Press CTRL+C to stop.

...STOPPED.

Starting post-processing. This may take some time.

zero-sized file (perf.data), nothing to do!
Generating native image symbol files
zero-sized file (perf.data), nothing to do!
libcoreclr.so not found in perf data. Please verify that your .NET Core process is running and consuming CPU.
Saving native symbols
zero-sized file (perf.data), nothing to do!
...FINISHED
Exporting perf.data file
...FINISHED
Compressing trace files
...FINISHED
Cleaning up artifacts
...FINISHED

Trace saved to sampleTrace.trace.zip

link issue how to fix it ? Would you tell me? Tks.

dotnet-policy-service[bot] commented 7 months ago

This issue will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the issue, but please note that the issue will be locked if it remains inactive for another 30 days.

dotnet / runtime

Web api response is occasionally delayed #13042