etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.43k stars 9.73k forks source link

Failure to get list of machines #18290

Closed goussous0 closed 2 months ago

goussous0 commented 2 months ago

Bug report criteria

What happened?

I am trying to create a patroni cluster with etcd, I have two debian 12.6 machines with the IPs 10.0.0.251 and 10.0.0.250

What did you expect to happen?

patroni would be able to talk to etcdv3 with enable-grpc-gateway: true in etcd config and using etcd3 in patroni config

How can we reproduce it (as minimally and precisely as possible)?

  1. add the etcd config below to /etc/default/etcd on both machines with right ips
  2. create patroni config.yml with right ips
  3. restart etcd services on both machines
  4. use patroni patroni /etc/patroni/config.yml

Anything else we need to know?

Patroni /etc/patroni/config.yml

scope: patroni_test
name: postgresql0

restapi:
  listen: 10.0.0.251:8008
  connect_address: 10.0.0.251:8008

etcd3:
  hosts: 10.0.0.251:2379,10.0.0.250:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        hot_standby: "on"
        wal_keep_segments: 20
        max_wal_senders: 8
        max_replication_slots: 8

  # some desired options for 'initdb'
  initdb:  # Note: It needs to be a list (some options need values, others are switches)
  - encoding: UTF8
  - data-checksums

  pg_hba:  # Add following lines to pg_hba.conf after running 'initdb'
  - host replication replicator 10.0.0.0/24 md5
  - host all all 0.0.0.0/0 md5

  # Some additional users users which needs to be created after initializing new cluster
  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb

postgresql:
  listen: 10.0.0.251:5432
  connect_address: 10.0.0.251:5432
  data_dir: /var/lib/pgsql/data
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: rep-pass
    superuser:
      username: postgres
      password: postgres
    rewind:  # Has no effect on postgres 10 and lower
      username: rewind_user
      password: rewind_password
  parameters:
    unix_socket_directories: '.'

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

Etcd version (please run commands below)

```console $ etcd --version etcd Version: 3.4.23 Git SHA: Not provided (use ./build instead of go build) Go Version: go1.19.8 Go OS/Arch: linux/amd64 $ etcdctl version etcdctl version: 3.4.23 API version: 3.4 ```

Etcd configuration (command line flags or environment variables)

name: 'node1' listen-peer-urls: 'http://10.0.0.251:2380' listen-client-urls: 'http://10.0.0.251:2379' initial-advertise-peer-urls: 'http://10.0.0.251:2380' advertise-client-urls: 'http://10.0.0.251:2379' initial-cluster: 'node1=http://10.0.0.251:2380,node2=http://10.0.0.250:2380' initial-cluster-state: 'new' initial-cluster-token: 'etcd-cluster-1' enable-grpc-gateway: true

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

```console $ etcdctl member list -w table +------------------+---------+------+-----------------------+-----------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+------+-----------------------+-----------------------+------------+ | 8e9e05c52164694d | started | node1 | http://localhost:2380 | http://localhost:2379 | false | +------------------+---------+------+-----------------------+-----------------------+------------+ $ etcdctl --endpoints= endpoint status -w table +-----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | 10.0.0.251:2379 | 4ae9849ca223830f | 3.4.23 | 29 kB | true | false | 16 | 50 | 50 | | | 10.0.0.250:2379 | 2d0602d0d19ae514 | 3.4.23 | 25 kB | false | false | 16 | 50 | 50 | | +-----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ ```

Relevant log output

Running patroni -c /etc/patroni/config.yml results in

2024-07-07 18:29:25,885 INFO: waiting on etcd
2024-07-07 18:29:30,936 ERROR: Failed to get list of machines from http://10.0.0.251:2379/v3: <Unknown error: '404 page not found', code: 2>
2024-07-07 18:29:30,953 ERROR: Failed to get list of machines from http://10.0.0.250:2379/v3: <Unknown error: '404 page not found', code: 2>
2024-07-07 18:29:30,957 INFO: waiting on etcd

``
ahrtr commented 2 months ago

Could you raise an issue in patroni community and request them to triage firstly?

Also 3.4.23 is a little old, please try to use a newer version.

goussous0 commented 2 months ago

Could you raise an issue in patroni community and request them to triage firstly?

Also 3.4.23 is a little old, please try to use a newer version.

The highest version I found on new Debian installations was 3.4.23.