FDio / govpp

Go toolset for the VPP.
Apache License 2.0
199 stars 82 forks source link

GoVPP CLI: worker to main thread when using api : communication problem ? #192

Open malebeau opened 8 months ago

malebeau commented 8 months ago

Intro

This issue tracks a potential bug or misbehaviour when using several worker on VPP (tested on VPP 23.06).

Problem

We are facing an issue when using several worker in the VPP startup config when sending a basic ping command through GoVPP CLI. Here are the logs (1 main thread + 2 workers):

vpp# sh threads
ID     Name                Type        LWP     Sched Policy (Priority)  lcore  Core   Socket State
0      vpp_main                        160368  other (0)                1      0      0
1      vpp_wk_0            workers     160371  other (0)                2      0      0
2      vpp_wk_1            workers     160372  other (0)                3      0      0

vpp# sh hardware-interfaces
              Name                Idx   Link  Hardware
eth4                               1     up   eth4
  Link speed: unknown
  RX Queues:
    queue thread         mode
    0     vpp_wk_0 (1)   polling
  TX Queues:
    TX Hash: [name: hash-eth-l34 priority: 50 description: Hash ethernet L34 headers]
    queue shared thread(s)
    0     yes    0-2
  Ethernet address fa:16:3e:67:61:ef
  Red Hat Virtio

root@ddd:/home/lab# ./govpp -L trace cli ping 192.168.200.1
TRAC[0000]options.go:74 main.InitOptions() log level set to: trace
TRAC[0000]options.go:84 main.InitOptions() init global options: &{Debug:false LogLevel:trace Color:}
DEBU[0000]cmd_cli.go:127 main.newCliCommand.func1() provided 2 args
TRAC[0000]cmd_cli.go:227 main.newBinapiVppCli() connecting to VPP API socket "/run/vpp/api.sock"
vpp# ping 192.168.200.1
DEBU[0000]cmd_cli.go:192 main.runCliCmd() executing CLI command: ping 192.168.200.1
TRAC[0000]cmd_cli.go:255 main.(*vppcliBinapi).Execute() sending CLI command: "ping 192.168.200.1"

Statistics: 5 sent, 0 received, 100% packet loss

DEBU[0005]cmd_cli.go:269 main.(*vppcliBinapi).Close() disconnecting VPP API connection

We made some dpdk- input trace, and the icmp packets are well received but it seems it does not communicate properly the result. Is this because API is running on main thread ?

Here are the logs (1 main thread):

vpp# sh threads
ID     Name                Type        LWP     Sched Policy (Priority)  lcore  Core   Socket State
0      vpp_main                        160396  other (0)                1      0      0

vpp# sh hardware-interfaces
              Name                Idx   Link  Hardware
eth4                               1     up   eth4
  Link speed: unknown
  RX Queues:
    queue thread         mode
    0     main (0)       polling
  TX Queues:
    TX Hash: [name: hash-eth-l34 priority: 50 description: Hash ethernet L34 headers]
    queue shared thread(s)
    0     no     0
  Ethernet address fa:16:3e:67:61:ef

root@ddd:/home/lab# ./govpp -L trace cli ping 192.168.200.1
TRAC[0000]options.go:74 main.InitOptions() log level set to: trace
TRAC[0000]options.go:84 main.InitOptions() init global options: &{Debug:false LogLevel:trace Color:}
DEBU[0000]cmd_cli.go:127 main.newCliCommand.func1() provided 2 args
TRAC[0000]cmd_cli.go:227 main.newBinapiVppCli() connecting to VPP API socket "/run/vpp/api.sock"
vpp# ping 192.168.200.1
DEBU[0000]cmd_cli.go:192 main.runCliCmd() executing CLI command: ping 192.168.200.1
TRAC[0000]cmd_cli.go:255 main.(*vppcliBinapi).Execute() sending CLI command: "ping 192.168.200.1"
116 bytes from 192.168.200.1: icmp_seq=1 ttl=64 time=10.8021 ms
116 bytes from 192.168.200.1: icmp_seq=2 ttl=64 time=.3975 ms
116 bytes from 192.168.200.1: icmp_seq=3 ttl=64 time=.3383 ms
116 bytes from 192.168.200.1: icmp_seq=4 ttl=64 time=.2925 ms
116 bytes from 192.168.200.1: icmp_seq=5 ttl=64 time=.2718 ms

Statistics: 5 sent, 5 received, 0% packet loss

DEBU[0005]cmd_cli.go:269 main.(*vppcliBinapi).Close() disconnecting VPP API connection

Solution

This problem can be solved by using only one main thread. Or by setting rx-placement on main queue : set interface rx-placement eth4 queue0 main

malebeau commented 7 months ago

@sknat As discussed last week at Calicocon, I tried with 24.02 VPP release without more luck. Same behaviour.

sknat commented 7 months ago

Hi @surfmax This is probably due to the way the ping operates in VPP, It was worth trying over the binary API as well (snippet below), but the problem seems to persist. As this is most probably a VPP issue, can you try reporting it to vpp-dev@lists.fd.io ?

func doPing(conn api.Connection) {
    sub, err := conn.WatchEvent(context.Background(), (*ping.PingFinishedEvent)(nil))
    if err != nil {
        log.Fatalln("ERROR:", err)
    }
    var wg sync.WaitGroup
    wg.Add(1)

    go func() {
        fmt.Println("waiting for events")
        defer fmt.Println("done waiting for events")

        for notif := range sub.Events() {
            e, ok := notif.(*ping.PingFinishedEvent)
            if !ok {
                fmt.Printf("invalid notification type: %#v\n", e)
                continue
            }
            fmt.Printf("Ping %+v\n", e)
            wg.Done()
        }
    }()

    c := ping.NewServiceClient(conn)

    reply, err := c.WantPingFinishedEvents(context.Background(), &ping.WantPingFinishedEvents{
        Address: ip_types.NewAddress(net.ParseIP("20.0.0.2")),
        Repeat: 10,
        Interval: 1.0,
    })
    if err != nil {
        log.Fatalln("ERROR:", err)
    }
    fmt.Printf("ping ok: %+v\n", reply)
    wg.Wait()
}