delimitrou / DeathStarBench

Open-source benchmark suite for cloud microservices
Apache License 2.0
754 stars 421 forks source link

keeps reconnect_socket while running wrk command to generate workloads on Linux #283

Open Zhongqi0402 opened 1 year ago

Zhongqi0402 commented 1 year ago

Hello all, I've been following the steps listed in README and I'm getting the below problem. When I run ../wrk2/wrk -D exp -t2 -c100 -d30s -L -s ./wrk2/scripts/hotel-reservation/mixed-workload_type_1.lua http://0.0.0.0:5000/tcp -R2000 in hotelReservation and any other microservices, I keep getting reconnect_socket output to stdout. I added some debug printf statements, and they are from socket_writeable and socket_readable function in wrk.c file. Since the socket is not connected properly, the final output is

-----------------------------------------------------------------------
Test Results @ http://0.0.0.0:5000/tcp 
  Thread Stats   Avg      Stdev     99%   +/- Stdev
    Latency     -nanus    -nanus   0.00us    0.00%
    Req/Sec     0.00      0.00     0.00    100.00%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    0.00us
 75.000%    0.00us
 90.000%    0.00us
 99.000%    0.00us
 99.900%    0.00us
 99.990%    0.00us
 99.999%    0.00us
100.000%    0.00us

  Detailed Percentile spectrum:
       Value   Percentile   TotalCount 1/(1-Percentile)

       0.000     1.000000            0          inf
#[Mean    =         -nan, StdDeviation   =         -nan]
#[Max     =        0.000, Total count    =            0]
#[Buckets =           27, SubBuckets     =         2048]
-----------------------------------------------------------------------
  0 requests in 30.05s, 0.00B read
  Socket errors: connect 0, read 28960, write 30469, timeout 0
Requests/sec:      0.00  
Transfer/sec:       0.00B

Below is a summary of the system I'm running on

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  128
  On-line CPU(s) list:   0-127
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 7B13 64-Core Processor
    CPU family:          25
    Model:               1
    Thread(s) per core:  2
    Core(s) per socket:  64
    Socket(s):           1
    Stepping:            1
    Frequency boost:     enabled
    CPU max MHz:         3539.7939
    CPU min MHz:         1500.0000
    BogoMIPS:            4499.78

Does anyone know how to resolve this?

thanks

gy1005 commented 1 year ago

Hi, thanks for posting the issues. I am not able to reproduce the error you reported. Would you mind uploading more details about it, such as the error log of the frontend service with docker logs hotel_reserv_frontend? I am suspecting the following and you can probably check them up:

Zhongqi0402 commented 1 year ago

Thank you so much for your response and sorry for this late response from me. Yes, I will check the points you mentioned.

Zhongqi0402 commented 1 year ago

I had a look at the frontend logs with command sudo docker logs hotel_reserv_frontend, and below is partial output:

2023-11-09T02:24:14Z INF cmd/frontend/main.go:23 > Reading config...
2023-11-09T02:24:14Z INF cmd/frontend/main.go:39 > Read target port: 5000
2023-11-09T02:24:14Z INF cmd/frontend/main.go:40 > Read consul address: consul:8500
2023-11-09T02:24:14Z INF cmd/frontend/main.go:41 > Read jaeger address: jaeger:6831
2023-11-09T02:24:14Z INF cmd/frontend/main.go:48 > Initializing jaeger agent [service name: frontend | host: jaeger:6831]...
2023-11-09T02:24:14Z INF tracing/tracer.go:26 > Jaeger client: adjusted sample ratio 0.010000
2023-11-09T02:24:14Z PNC cmd/frontend/main.go:51 > Got error while initializing jaeger agent: lookup jaeger: Temporary failure in name resolution
panic: Got error while initializing jaeger agent: lookup jaeger: Temporary failure in name resolution

It seems that it's a DNS resolution error. So I also checked logs for the jaeger container, and below is the output:

{"level":"warn","ts":1699496743.1466587,"caller":"grpc@v1.58.3/clientconn.go:1515","msg":"[core][Channel #12 SubChannel #13] grpc: addrConn.createTransport failed to connect to {Addr: \"localhost:16685\", ServerName: \"localhost:16685\", }. Err: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:16685: connect: connection refused\"","system":"grpc","grpc_log":true}

I'm not familiar with docker networking in general, but based on what I found online, it should be dialled up at port 16686, not 16685 right?

I tried to use sudo docker exec -it hotel_reserv_frontend curl http://jaeger:16686 to check connectivity, and it works fine. But if I ping 16685, there is a problem. I'm very confused with what's happening now. Can someone please point some pointers?

thank you