ClickHouse / clickhouse-go

Golang driver for ClickHouse
Apache License 2.0
2.88k stars 553 forks source link

When deactivating a clickhouse cluster node, it is not possible to automatically connect to the normal node #1336

Closed Ri0nGo closed 3 months ago

Ri0nGo commented 3 months ago

Observed

I have built a clickhouse cluster with 2 nodes, using http protocol to connect to the cluster, when I stop a node, querying the data will report an error.

detail descption:

  1. Use clickhouse-go to connect clickhouse cluster, cluster have 2 node(node1, node2)
  2. When I stop clickhouse node1, The go program won't connect, it lasts for a minute and then it still won't connect
  3. When I restart clickhouse node1, The go program recovery connect

Expected behaviour

It is possible to switch to the normal clickhouse node

Code example

// use http connect
func InitClickHouseClusterWithHttp(hosts []string, username, password, database string) (*sql.DB, error) {
    sqlDb := clickhouse.OpenDB(&clickhouse.Options{
        Addr: hosts,
        Auth: clickhouse.Auth{
            Database: database,
            Username: username,
            Password: password,
        },
        Protocol: clickhouse.HTTP,
    })
    return sqlDb, sqlDb.Ping()
}

// query data
func TestInitClickHouseClusterWithHttp(t *testing.T) {
    hosts := []string{
        "192.168.1.230:8128",
        "192.168.1.231:8129",
    }
    db, err := InitClickHouseClusterWithHttp(hosts, "username", "password", "tmp_db")
    if err != nil {
        fmt.Println(err)
    }

    for {
        rows, err := db.Query("SELECT uuid, ts, value FROM history_data order by ts limit 1")
        if err != nil {
            fmt.Println(err)
        }
        for rows.Next() {
            var (
                uuid  string
                ts    time.Time
                value float64
            )
            err = rows.Scan(&uuid, &ts, &value)
            if err != nil {
                fmt.Println(err)
            }
            fmt.Println(uuid, ts, value)
        }
        time.Sleep(1 * time.Second)
    }
}

Error log

Post "http://admin:***@192.168.1.230:8128?database=tmp_db&default_format=Native": dial tcp 192.168.1.230:8128: connectex: No connection could be made because the target machine actively refused it.

Details

Environment

jkaflik commented 3 months ago

Hi @Ri0nGo

The client does not implement a failover mechanism for a broken nodes. (This can be implemented as an additional feature.)

I think about two approaches you can take to make it work as you expect:

  1. use a load-balancer in front of your nodes. It will take care of the failover switch.
  2. implement a custom DialStrategy logic. It's a undocumented feature. Explanation in a PR: https://github.com/ClickHouse/clickhouse-go/pull/855

I would go with the first option as the most robust and resilient solution.