bitnami/etcd adding new node would fail and fall into restarting loop

sinawic commented 4 months ago

Name and Version

bitnami/etcd:3.5.14

What architecture are you using?

amd64

What steps will reproduce the bug?

I have the following docker-compose.yml file:

version: '3'
services:
  etcd1:
    image: docker.io/bitnami/etcd:3.5.14
    container_name: etcd1
    environment:
      - ALLOW_NONE_AUTHENTICATION=yes
      - ETCD_NAME=etcd1
      - ETCD_INITIAL_ADVERTISE_PEER_URLS=http://etcd1:2380
      - ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
      - ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
      - ETCD_ADVERTISE_CLIENT_URLS=http://etcd1:2379
      - ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster
      - ETCD_INITIAL_CLUSTER=etcd1=http://etcd1:2380,etcd2=http://etcd2:2380,etcd3=http://etcd3:2380
      - ETCD_INITIAL_CLUSTER_STATE=new
    volumes:
      - ./data/etcd1:/bitnami/etcd
    ports:
      - 23791:2379
      - 23801:2380
  etcd2:
    image: docker.io/bitnami/etcd:3.5.14
    container_name: etcd2
    environment:
      - ALLOW_NONE_AUTHENTICATION=yes
      - ETCD_NAME=etcd2
      - ETCD_INITIAL_ADVERTISE_PEER_URLS=http://etcd2:2380
      - ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
      - ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
      - ETCD_ADVERTISE_CLIENT_URLS=http://etcd2:2379
      - ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster
      - ETCD_INITIAL_CLUSTER=etcd1=http://etcd1:2380,etcd2=http://etcd2:2380,etcd3=http://etcd3:2380
      - ETCD_INITIAL_CLUSTER_STATE=new
    volumes:
      - ./data/etcd2:/bitnami/etcd
    ports:
      - 23792:2379
      - 23802:2380
  etcd3:
    image: docker.io/bitnami/etcd:3.5.14
    container_name: etcd3
    environment:
      - ALLOW_NONE_AUTHENTICATION=yes
      - ETCD_NAME=etcd3
      - ETCD_INITIAL_ADVERTISE_PEER_URLS=http://etcd3:2380
      - ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
      - ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
      - ETCD_ADVERTISE_CLIENT_URLS=http://etcd3:2379
      - ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster
      - ETCD_INITIAL_CLUSTER=etcd1=http://etcd1:2380,etcd2=http://etcd2:2380,etcd3=http://etcd3:2380
      - ETCD_INITIAL_CLUSTER_STATE=new
    volumes:
      - ./data/etcd3:/bitnami/etcd
    ports:
      - 23793:2379
      - 23803:2380

initially, I start my etcd cluster and everything worked as expected then if I try to add a new instance add the following to the docker-compose.yml file:

  etcd4:
    image: docker.io/bitnami/etcd:3.5.14
    container_name: etcd4
    restart: always
    environment:
      - ALLOW_NONE_AUTHENTICATION=yes
      - ETCD_NAME=etcd4
      - ETCD_INITIAL_ADVERTISE_PEER_URLS=http://etcd4:2380
      - ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
      - ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
      - ETCD_ADVERTISE_CLIENT_URLS=http://etcd4:2379
      - ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster
      - ETCD_INITIAL_CLUSTER=etcd1=http://etcd1:2380,etcd2=http://etcd2:2380,etcd3=http://etcd3:2380,etcd4=http://etcd4:2380
      - ETCD_INITIAL_CLUSTER_STATE=existing
    volumes:
      - ./data/etcd4:/bitnami/etcd
    ports:
      - 23794:2379
      - 23804:2380

the new instance won't just join the existing cluster and I get the following error in the logs:

etcd4    | etcd 11:08:29.14 INFO  ==> Submit issues and feature requests at https://github.com/bitnami/containers/issues
etcd4    | etcd 11:08:29.14 INFO  ==> Upgrade to Tanzu Application Catalog for production environments to access custom-configured and pre-packaged software components. Gain enhanced features, including Software Bill of Materials (SBOM), CVE scan result reports, and VEX documents. To learn more, visit https://bitnami.com/enterprise
etcd4    | etcd 11:08:29.14 INFO  ==> 
etcd4    | etcd 11:08:29.15 INFO  ==> ** Starting etcd setup **
etcd4    | etcd 11:08:29.16 INFO  ==> Validating settings in ETCD_* env vars..
etcd4    | etcd 11:08:29.17 WARN  ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment.
etcd4    | etcd 11:08:29.17 INFO  ==> Initializing etcd
etcd4    | etcd 11:08:29.18 INFO  ==> Generating etcd config file using env variables
etcd4    | etcd 11:08:29.20 INFO  ==> Detected data from previous deployments
etcd4    | /opt/bitnami/scripts/libetcd.sh: line 450: ETCD_ACTIVE_ENDPOINTS: unbound variable
etcd4    | etcd 11:08:29.23 INFO  ==> Adding new member to existing cluster
etcd4    | {"level":"warn","ts":"2024-07-06T11:08:34.244538Z","logger":"etcd-client","caller":"v3@v3.5.14/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000018000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
etcd4    | Error: context deadline exceeded

am I doing anything wrong? i searched and couldn't find anything to fix my issue on the internet and the repo readme doesn't really provide much docs about adding new instances to existing etcd cluster.

Thanks in advance

What is the expected behavior?

adding a new instance to the existing etcd cluster, the new instance should join the cluster and be detected by other instances also

What do you see instead?

new instance falls into a restarting loop exiting with the following error:

{"level":"warn","ts":"2024-07-06T11:08:34.244538Z","logger":"etcd-client","caller":"v3@v3.5.14/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000018000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
Error: context deadline exceeded

I just can't understand this section: etcd-endpoints://0xc000018000/127.0.0.1:2379 what kinda url is that and how and by who is it created

carrodher commented 4 months ago

The issue may not be directly related to the Bitnami container image/Helm chart, but rather to how the application is being utilized, configured in your specific environment, or tied to a specific scenario that is not easy to reproduce on our side.

If you think that's not the case and are interested in contributing a solution, we welcome you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here.

Your contribution will greatly benefit the community. Feel free to reach out if you have any questions or need assistance.

Suppose you have any questions about the application, customizing its content, or technology and infrastructure usage. In that case, we highly recommend that you refer to the forums and user guides provided by the project responsible for the application or technology.

With that said, we'll keep this ticket open until the stale bot automatically closes it, in case someone from the community contributes valuable insights.

sinawic commented 4 months ago

hey thanks for the response in my use case for now i won't really need to add instances to my cluster since there won't be that much load on it (used for rabbitmq for peer discovery)

but i just wanted to give it a test and see how scaling works and i got this error. i have seen a familiar error on etcd itself and i think as u mentioned it is some error related to etcd itself and not bitnami package.

anyways i would be glad if it could be fixed. but not forced task for my own use case. thanks

github-actions[bot] commented 3 months ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 3 months ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

bitnami / containers