bufbuild / httplb

Client-side load balancing for net/http
https://pkg.go.dev/github.com/bufbuild/httplb
Apache License 2.0
47 stars 2 forks source link

Bug: polling health checker will always fail #69

Open humbornjo opened 1 hour ago

humbornjo commented 1 hour ago

We run into some trouble with client conn control, and trying to use the health check. however, the default health checker will always fail with its context deadline exceeded. I think its because the line 332 in httplb/balancer.go, should use withtimeout instead of withcancel.

connection := conns[i]
connCtx, connCancel := context.WithCancel(b.ctx)
healthChecker := b.healthChecker.New(connCtx, connection, b)
go func() {
    defer connCancel()
    if err := connection.Prewarm(connCtx); err == nil {
        b.warmedUp(connection)
    }
}()

here is my test code

// server.go
package main

import (
    "github.com/gofiber/fiber/v2"
    "github.com/gofiber/fiber/v2/middleware/healthcheck"
    "github.com/gofiber/fiber/v2/middleware/recover"
)

func main() {
    app := fiber.New()

    app.Use(recover.New())
    app.Use(healthcheck.New(
        healthcheck.Config{
            LivenessEndpoint: "/healthz",
            LivenessProbe: func(c *fiber.Ctx) bool {
                c.Send([]byte("OK"))
                return true
            }},
    ))
    app.Get("/foo", func(c *fiber.Ctx) error {
        return c.Send([]byte("bar"))
    })
    app.Get("/", func(c *fiber.Ctx) error {
        return c.Send([]byte("bar"))
    })
    if err := app.Listen(":3000"); err != nil {
        panic(err)
    }
}

// client.go
package main

import (
    "fmt"
    "net/http"
    "net/url"
    "time"

    "github.com/bufbuild/httplb"
    "github.com/bufbuild/httplb/health"
)

func main() {

    url := url.URL{
        Scheme: "http",
        Host:   "localhost:3000",
        Path:   "foo",
    }

    client := httplb.NewClient(
        httplb.WithHealthChecks(
            health.NewPollingChecker(
                health.PollingCheckerConfig{
                    Timeout:         100 * time.Second,
                    PollingInterval: 100 * time.Second,
                },
                health.NewSimpleProber("/healthz"),
            ),
        ),
    )

    for range 5 {
        req, err := http.NewRequest(http.MethodGet, url.String(), http.NoBody)
        if err != nil {
            fmt.Println(err)
            continue
        }

        resp, err := client.Do(req)
        if err != nil {
            fmt.Println(err)
            continue
        }
        fmt.Println(resp.Status)
        time.Sleep(1 * time.Second)
    }
}

the error log

200 OK
Get "http://localhost:3000/foo": unavailable: no healthy connections
Get "http://localhost:3000/foo": unavailable: no healthy connections
Get "http://localhost:3000/foo": unavailable: no healthy connections
Get "http://localhost:3000/foo": unavailable: no healthy connections
humbornjo commented 1 hour ago

btw, I wrote a dummy prober with no use of ctx, and it works well.

humbornjo commented 1 hour ago

Speaking of which, about the wired behavior health when the conn dialing timeout. it should report unhealthy, but in fact it still use the timeout conn. I am still trying to comprehend the code logic of this part, maybe discuss it in another thread.