fabiolb / fabio

Consul Load-Balancing made simple
https://fabiolb.net
MIT License
7.25k stars 619 forks source link

Fabio disconnects gRPC connection randomly #759

Open Cronnay opened 4 years ago

Cronnay commented 4 years ago

We have a gRPC service in Nomad running as intended. Then we run Fabio as LB. So when we are running our gRPC client through Fabio, the connection will close unexpectedly with the message on client side: grpc: the client connection is closing or Cancelled: context canceled, and we get similar issues on the server. Happens when it is streaming from server for a couple of minutes.

EDIT: Have been following TCPDump, I can see that the connection from client to Fabio disconnects, but not from Fabio to gRPC service. In TCPDump, I see that the server is still pushing rows to Fabio.

We don't get this issue when running directly to the port given from Nomad. There we can run a query for 15+ minutes and no issues. So I think this is an issue with Fabio.
Appreciate any kind of help.

This is our Fabio job

job "lb" {
  datacenters = ["dc1"]
  type = "system"

  group "fabio" {

    restart {
      attempts = 2
      delay = "30s"
      mode = "delay"
    }

    task "fabio" {
      driver = "docker"

      config {
        image = "fabiolb/fabio:1.5.13-go1.13.4"
        network_mode = "host"
      }

      env {
        proxy_addr           = ":8080;proto=http,:5160;proto=grpc"
        ui_color             = "blue lighten-1"
        ui_title             = "${NOMAD_DC}"
      }

      service {
        name = "${NOMAD_TASK_NAME}-http"
        tags = ["http", "lb"]
        port = "http"

        check {
          name     = "${NOMAD_TASK_NAME} http"
          type     = "tcp"
          interval = "60s"
          timeout  = "15s"
        }
      }

      service {
        name = "${NOMAD_TASK_NAME}-ui"
        tags = ["ui"]
        port = "ui"

        check {
          name     = "${NOMAD_TASK_NAME}-ui http"
          type     = "http"
          path     = "/health"
          interval = "60s"
          timeout  = "15s"

          check_restart {
            limit = 3
            grace = "90s"
            ignore_warnings = false
          }
        }
      }

      resources {
        cpu    = 500
        memory = 128
        network {
          mbits = 20

          port "http" {
            static = 8080
          }
          port "ui" {
            static = 9998
          }
          port "grpc" {
            static = 5160
          }
        }
      }
    }
  }
}
sunliusi commented 4 years ago

same question. any solution has found?

Cronnay commented 4 years ago

Hello @sunliusi

We decided to use Traefik instead. We spent over 20 hours debugging, so we are done with that. We didn't get any response from the maintainers, nor anyone else in the community.

sunliusi commented 4 years ago

@Cronnay No response is dispirited Fabio support for grpc is not good, can't custom route for grpc. I will try Traefik. Thank you.

tommyalatalo commented 4 years ago

@leprechau @nathanejohnson @pschultz @magiconair - do any of you have input on this issue? It's quite severe if it is in fact the case that fabio drops grpc streams incorrectly.

cmvoicu commented 4 years ago

Does Fabio support in the end server streaming for grpc protocol? It seems the connection is dropped/closed after the request and no events can be streamed later on....

ishworg commented 4 years ago

I think @andyroyle is the original author behind gRPC proxy. Just saying.

ishworg commented 4 years ago

@Cronnay what's the Fabio configuration behind that nomad task?

Cronnay commented 4 years ago

The nomad task has all the configuration @ishworg. We tried to add some of other configurations, such as timeout. But we don't use any specific config file, only the nomad job

Jake-S6 commented 2 years ago

Hello @sunliusi

We decided to use Traefik instead. We spent over 20 hours debugging, so we are done with that. We didn't get any response from the maintainers, nor anyone else in the community.

Any issues when migrating to Traffic? I am planning to do the same due to poor gRPC docs/support.

sunliusi commented 2 years ago

Hello @sunliusi

We decided to use Traefik instead. We spent over 20 hours debugging, so we are done with that. We didn't get any response from the maintainers, nor anyone else in the community.

Any issues when migrating to Traffic? I am planning to do the same due to poor gRPC docs/support.

Traffic is fine. what's your problems?