aristanetworks / goarista

Fairly general building blocks used in Arista Go code and open-sourced for the benefit of all.
Apache License 2.0
213 stars 68 forks source link

gNMI fails with timeout error #33

Closed jakechen0816 closed 5 years ago

jakechen0816 commented 5 years ago

Hello, I am currently getting an issue.

When I try to dial into Arista switch and subcribe a path to grab data from swith. I got an error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: timed out waiting for server handshake.

Does it mean too many connection requests to the switch? I am not sure why it returns timeout. Is it the issue about my code or something wrong on the switch side. Thanks

some codes:
ip :=ipaddress
cfg := &aristagnmi.Config{}
cfg.Username =username
cfg.Password = password
cfg.Addr = ip + ":5909"
ctx := aristagnmi.NewContext(context.Background(), cfg)
client, err := aristagnmi.Dial(cfg)
if err != nil {
    fmt.Println(err)
}
respChan := make(chan *pb.SubscribeResponse)
errChan := make(chan error)
defer close(respChan)
defer close(errChan)

paths := append(make([]string, 0), "/lldp/interfaces/interface/neighbors")
subscribeOptions := &aristagnmi.SubscribeOptions{
    Mode:       "stream",
    StreamMode: "target_defined",
    Paths:      aristagnmi.SplitPaths(paths),
}

go aristagnmi.Subscribe(ctx, client, subscribeOptions, respChan, errChan)

for {
     select {
          case resp := <-respChan:
                         .........
          case err := <-errChan:
                     **got error here: timeout**
     }
}
aaronbee commented 5 years ago

The most common issue is that the control-plane firewall on the device is blocking your connection. Is port 5909 open? In addition you can run "show management api gnmi" to see if the gnmi server started successfully.

jakechen0816 commented 5 years ago

The most common issue is that the control-plane firewall on the device is blocking your connection. Is port 5909 open? In addition you can run "show management api gnmi" to see if the gnmi server started successfully.

Thanks for you reply, Yes, the gNMI is running on 5909. Enabled: Yes Server: running on port 5909, in default VRF SSL Profile: none QoS DSCP: none

Acturally, I am running two server, the regular gNMI is on 5909 that grabs the tree subsribed from switch. Otherwise, I am running: daemon TerminAttr exec /usr/bin/TerminAttr -grpcaddr=<localmanagementIP of switch>:<port> -allowed_ips=<net>/<mask> no shutdown on Port 5910 that grab PTP information from switch.

paths2 := append(make([]string, 0), "/Sysdb/ptp/status/currentDS") subscribeOptions2 := &aristagnmi.SubscribeOptions{ Mode: "stream", StreamMode: "target_defined", Paths: aristagnmi.SplitPaths(paths2), } go aristagnmi.Subscribe(ctx2, client2, subscribeOptions2, respChan2, errChan) The second one is never getting error.

I am trying to remove the second command (All codes related to second one), shutdown the daemon on swith, and only running regular gNMI on 5909. Unfortunatly I am still getting timeout error.

I am also trying to stop the go channel when I get an error from response but I am really new to GO language and I did not find a way to stop the go channel or called go process. the CPU usage will increase from 100% to 300% or more. I believe that the subscriton still running on the backend until I terminate the GO.

jakechen0816 commented 5 years ago

In my code, I set a schedule to dial in to arista switch. if the connection is not set up successfully, it will dial in again until the connection is setup. I notice after 3 or 5 times trying to connect the switch, it will build the connection successfully. However, the CPU usage will be 300 or 500% on Ubuntu system. It seems all processes before success still exist in backend.

aaronbee commented 5 years ago

The first time OpenConfig is configured on the switch after restart it takes a few minutes to start up. You can check in the agent logs (show agent OpenConfig logs) at what time it says it starts serving on port 5909.

jakechen0816 commented 5 years ago

The first time OpenConfig is configured on the switch after restart it takes a few minutes to start up. You can check in the agent logs (show agent OpenConfig logs) at what time it says it starts serving on port 5909.

Thanks Aaron, I did not restart the switch, the switch is keeping running, only the go client is restarted. and I think the OpenConfig is running when the go client tried to dial in the switch.

aaronbee commented 5 years ago

Please contact our support for further assistance.