request help: config_etcd.lua has been consistently and frequently reporting errors

apache / apisix

The Cloud-Native API Gateway

https://apisix.apache.org/blog/

Apache License 2.0

14.45k stars 2.51k forks source link

request help: config_etcd.lua has been consistently and frequently reporting errors #2695

Closed Applenice closed 3 years ago

Applenice commented 3 years ago

Issue description

After installing APISIX 2.0, the apisix/logs/error.log file shows that config_etcd.lua has been consistently and frequently reporting errors. The phenomenon still exists after reinstalling apisix and etcd without modifying any configuration during the period. Nearly 2,200 lines of error logs typed in nearly 20 minutes, similar to the following:

2020/11/10 20:18:30 [error] 31545#31545: *7224 [lua] config_etcd.lua:428: failed to fetch data from etcd: failed to read etcd dir,  etcd key: /apisix/plugin_metadata, context: ngx.timer
2020/11/10 20:18:30 [error] 31545#31545: *7226 [lua] config_etcd.lua:428: failed to fetch data from etcd: failed to read etcd dir,  etcd key: /apisix/proto, context: ngx.timer
2020/11/10 20:18:30 [error] 31545#31545: *7225 [lua] config_etcd.lua:428: failed to fetch data from etcd: failed to read etcd dir,  etcd key: /apisix/ssl, context: ngx.timer
2020/11/10 20:18:30 [error] 31545#31545: *7227 [lua] config_etcd.lua:428: failed to fetch data from etcd: failed to read etcd dir,  etcd key: /apisix/consumers, context: ngx.timer
2020/11/10 20:18:30 [error] 31545#31545: *7228 [lua] config_etcd.lua:428: failed to fetch data from etcd: failed to read etcd dir,  etcd key: /apisix/upstreams, context: ngx.timer
2020/11/10 20:18:30 [error] 31550#31550: *7229 [lua] config_etcd.lua:428: failed to fetch data from etcd: failed to read etcd dir,  etcd key: /apisix/services, context: ngx.timer
2020/11/10 20:18:30 [error] 31550#31550: *7232 [lua] config_etcd.lua:428: failed to fetch data from etcd: failed to read etcd dir,  etcd key: /apisix/proto, context: ngx.timer
2020/11/10 20:18:30 [error] 31550#31550: *7233 [lua] config_etcd.lua:428: failed to fetch data from etcd: failed to read etcd dir,  etcd key: /apisix/global_rules, context: ngx.timer
2020/11/10 20:18:30 [error] 31550#31550: *7231 [lua] config_etcd.lua:428: failed to fetch data from etcd: failed to read etcd dir,  etcd key: /apisix/upstreams, context: ngx.timer
2020/11/10 20:18:30 [error] 31550#31550: *7230 [lua] config_etcd.lua:428: failed to fetch data from etcd: failed to read etcd dir,  etcd key: /apisix/plugin_metadata, context: ngx.timer

Installation method

yum install -y apisix-2.0-0.el7.noarch.rpm

Environment

apisix version (cmd: apisix version): 2.0

OS: CentOS Linux release 7.8.2003 (Core)

$ apisix version
2.0
$ etcd --version
etcd Version: 3.4.13
Git SHA: ae9734ed2
Go Version: go1.12.17
Go OS/Arch: linux/amd64

What should I do?

nic-chen commented 3 years ago

hi，does your etcd enable auth？

idbeta commented 3 years ago

can you try to check etcd data like this?

etcdctl get --prefix "/apisix"

Applenice commented 3 years ago

hi，does your etcd enable auth？

No configuration, a freshly installed state of etcd

Applenice commented 3 years ago

can you try to check etcd data like this?
etcdctl get --prefix "/apisix"

No information was returned after execution😐😐

idbeta commented 3 years ago

Can you try to run make init in the APISIX directory?

Applenice commented 3 years ago

Can you try to run make init in the APISIX directory?

No make command can be used.

$ pwd
/usr/local/apisix
$ ls
apisix  client_body_temp  conf  deps  fastcgi_temp  logs  proxy_temp  scgi_temp  uwsgi_temp
$ cd apisix/
$ pwd
/usr/local/apisix/apisix
$ ls
admin  api_router.lua  balancer  balancer.lua  consumer.lua  core  core.lua  debug.lua  discovery  http  init.lua  plugin.lua  plugins  router.lua  schema_def.lua  script.lua  ssl  stream  upstream.lua  utils

No change after trying to execute apisix init and apisix init_etcd😔

$ apisix
Usage: apisix [action] <argument>

help:       show this message, then exit
init:       initialize the local nginx.conf
init_etcd:  initialize the data of etcd
start:      start the apisix server
stop:       stop the apisix server
restart:    restart the apisix server
reload:     reload the apisix server
version:    print the version of apisix

souzens commented 3 years ago

the same wrong can't work now on 2.0 apisix run in k8s & apache/apisix:latest docker image

but apisix-dashboard 2.0rc3 works well

./etcd --version etcd Version: 3.4.13 Git SHA: ae9734ed2 Go Version: go1.12.17 Go OS/Arch: linux/amd64


./etcdctl --endpoints=10.111.9.154:2379 get --prefix "/apisix"
/apisix/routes/328088132001988967
{"id":"328088132001988967","create_time":1605085365,"update_time":1605086330,"uris":["/*"],"name":"test-pre","methods":["GET","HEAD","POST","PUT","DELETE","OPTIONS","PATCH"],"hosts":["venice.test-pre.com"],"vars":[],"upstream":{"nodes":[{"host":"venice.test-pre.svc.cluster.local","port":80,"weight":1}],"timeout":{"connect":6000,"read":6000,"send":6000},"type":"roundrobin"}}```

/usr/local/apisix $ curl http://127.0.0.1:9080/apisix/admin/routes/328088132001988967 -H 'X-AP
I-KEY: edd1c9f034335f136f87ad84b625c8f1'
<html>
<head><title>500 Internal Server Error</title></head>
<body>
<center><h1>500 Internal Server Error</h1></center>
<hr><center>openresty</center>
</body>
</html>

/usr/local/apisix/logs $ tail -n 10 error.log 
2020/11/11 17:57:00 [error] 26#26: *223005 [lua] config_etcd.lua:428: failed to fetch data from etcd: failed to read etcd dir,  etcd key: /apisix/routes, context: ngx.timer
2020/11/11 17:57:07 [error] 32#32: *226586 lua entry thread aborted: runtime error: /usr/local/apisix/apisix/core/etcd.lua:80: attempt to index field 'body' (a nil value)
stack traceback:
coroutine 0:
        /usr/local/apisix/apisix/core/etcd.lua: in function 'get'
        /usr/local/apisix/apisix/admin/routes.lua:166: in function </usr/local/apisix/apisix/admin/routes.lua:160>
        /usr/local/apisix/apisix/admin/init.lua:146: in function 'handler'
        /usr/local/apisix//deps/share/lua/5.1/resty/radixtree.lua:730: in function 'dispatch'
        /usr/local/apisix/apisix/init.lua:754: in function 'http_admin'
        content_by_lua(nginx.conf:148):2: in main chunk, client: 127.0.0.1, server: , request: "GET /apisix/admin/routes/328088132001988967 HTTP/1.1", host: "127.0.0.1:9080"```

idbeta commented 3 years ago

cc @gxthrj Do you have any idea?

souzens commented 3 years ago

by test just now, apisix2.0 will report error above when run wtih etcd-3.4.13 instead of etcd3.4.9 run ok

moonming commented 3 years ago

by test just now, apisix2.0 will report error above when run wtih etcd-3.4.13 instead of etcd3.4.9 run ok

@nic-chen please take a look

nic-chen commented 3 years ago

by test just now, apisix2.0 will report error above when run wtih etcd-3.4.13 instead of etcd3.4.9 run ok

@nic-chen please take a look

working on it.

nic-chen commented 3 years ago

@souzens

Thanks for feedback.

but it works fine on my env using etcd-3.4.13. could you please provide more details ? thanks.

@idbeta please help check. thanks

ziyou434 commented 3 years ago

I also have this problem, and both versions of etcd3.4.13 and etcd 3.4.9 report errors

nic-chen commented 3 years ago

I also have this problem, and both versions of etcd3.4.13 and etcd 3.4.9 report errors

Thanks for feedback.

Could you provide the steps and config details ? thanks.

ziyou434 commented 3 years ago

I also have this problem, and both versions of etcd3.4.13 and etcd 3.4.9 report errors

Thanks for feedback.

Could you provide the steps and config details ? thanks.

apisix2.0-alpine helm install etcd bitnami/etcd -n api-gateway --set auth.rbac.enabled=false --set image.tag=3.4.9

I have no name!@etcd-0:/opt/bitnami/etcd$ etcdctl get --prefix "/apisix" /apisix/consumers/ init_dir /apisix/global_rules/ init_dir /apisix/node_status/ init_dir /apisix/plugin_metadata/ init_dir /apisix/plugins/ init_dir /apisix/proto/ init_dir /apisix/routes/ init_dir /apisix/services/ init_dir /apisix/ssl/ init_dir /apisix/stream_routes/ init_dir /apisix/upstreams/ init_dir

tokers commented 3 years ago

@ziyou434 Could you provides the options that used for etcd start.

ziyou434 commented 3 years ago

@ziyou434 Could you provides the options that used for etcd start.

I use bitnami/etcd chart ,and --set auth.rbac.enabled=false. The chart use setup.sh to start etcd

setup.sh

#!/bin/bash

set -o errexit
set -o pipefail
set -o nounset

# Debug section
exec 3>&1
exec 4>&2

if [[ "${BITNAMI_DEBUG:-false}" = true ]]; then
    echo "==> Bash debug is on"
else
    echo "==> Bash debug is off"
    exec 1>/dev/null
    exec 2>/dev/null
fi

# Constants
HOSTNAME="$(hostname -s)"
AUTH_OPTIONS=""
export ETCDCTL_ENDPOINTS="etcd-0.etcd-headless.api-gateway.svc.cluster.local:2380"
export ROOT_PASSWORD="${ETCD_ROOT_PASSWORD:-}"
if [[ -n "${ETCD_ROOT_PASSWORD:-}" ]]; then
  unset ETCD_ROOT_PASSWORD
fi
# Functions
## Store member id for later member replacement
store_member_id() {
    while ! etcdctl $AUTH_OPTIONS member list; do sleep 1; done
    etcdctl $AUTH_OPTIONS member list | grep -w "$HOSTNAME" | awk '{ print $1}' | awk -F "," '{ print $1}' > "$ETCD_DATA_DIR/member_id"
    echo "==> Stored member id: $(cat ${ETCD_DATA_DIR}/member_id)" 1>&3 2>&4
    exit 0
}
## Configure RBAC
configure_rbac() {
    # When there's more than one replica, we can assume the 1st member
    # to be created is "etcd-0" since a statefulset is used
    if [[ -n "${ROOT_PASSWORD:-}" ]] && [[ "$HOSTNAME" == "etcd-0" ]]; then
        echo "==> Configuring RBAC authentication!" 1>&3 2>&4
        etcd &
        ETCD_PID=$!
        while ! etcdctl $AUTH_OPTIONS member list; do sleep 1; done
        echo "$ROOT_PASSWORD" | etcdctl $AUTH_OPTIONS user add root --interactive=false
        etcdctl $AUTH_OPTIONS auth enable
        kill "$ETCD_PID"
        sleep 5
    fi
}
## Checks whether there was a disaster or not
is_disastrous_failure() {
    local endpoints_array=(${ETCDCTL_ENDPOINTS//,/ })
    local active_endpoints=0
    local -r min_endpoints=$(((1 + 1)/2))

    for e in "${endpoints_array[@]}"; do
        if [[ "$e" != "$ETCD_ADVERTISE_CLIENT_URLS" ]] && (unset -v ETCDCTL_ENDPOINTS; etcdctl $AUTH_OPTIONS  endpoint health --endpoints="$e"); then
            active_endpoints=$((active_endpoints + 1))
        fi
    done
    if [[ $active_endpoints -lt $min_endpoints ]]; then
        true
    else
        false
    fi
}

## Check wether the member was succesfully removed from the cluster
should_add_new_member() {
    return_value=0
    if (grep -E "^Member[[:space:]]+[a-z0-9]+\s+removed\s+from\s+cluster\s+[a-z0-9]+$" "$(dirname "$ETCD_DATA_DIR")/member_removal.log") || \
       ! ([[ -d "$ETCD_DATA_DIR/member/snap" ]] && [[ -f "$ETCD_DATA_DIR/member_id" ]]); then
        rm -rf $ETCD_DATA_DIR/* 1>&3 2>&4
    else
        return_value=1
    fi
    rm -f "$(dirname "$ETCD_DATA_DIR")/member_removal.log" 1>&3 2>&4
    return $return_value
}

if [[ ! -d "$ETCD_DATA_DIR" ]]; then
    echo "==> Creating data dir..." 1>&3 2>&4
    echo "==> There is no data at all. Initializing a new member of the cluster..." 1>&3 2>&4
    store_member_id & 1>&3 2>&4
    configure_rbac
else
    echo "==> Detected data from previous deployments..." 1>&3 2>&4
    if [[ $(stat -c "%a" "$ETCD_DATA_DIR") != *700 ]]; then
        echo "==> Setting data directory permissions to 700 in a recursive way (required in etcd >=3.4.10)" 1>&3 2>&4
        chmod -R 700 $ETCD_DATA_DIR
    else
        echo "==> The data directory is already configured with the proper permissions" 1>&3 2>&4
    fi
    if [[ 1 -eq 1 ]]; then
        echo "==> Single node cluster detected!!" 1>&3 2>&4
    elif is_disastrous_failure; then
        echo "==> Cluster not responding!!" 1>&3 2>&4
        echo "==> Disaster recovery is disabled, the cluster will try to recover on it's own..." 1>&3 2>&4
    elif should_add_new_member; then
        echo "==> Adding new member to existing cluster..." 1>&3 2>&4
        etcdctl $AUTH_OPTIONS member add "$HOSTNAME" --peer-urls="http://${HOSTNAME}.etcd-headless.api-gateway.svc.cluster.local:2380" | grep "^ETCD_" > "$ETCD_DATA_DIR/new_member_envs"
        sed -ie "s/^/export /" "$ETCD_DATA_DIR/new_member_envs"
        echo "==> Loading env vars of existing cluster..." 1>&3 2>&4
        source "$ETCD_DATA_DIR/new_member_envs" 1>&3 2>&4
        store_member_id & 1>&3 2>&4
    else
        echo "==> Updating member in existing cluster..." 1>&3 2>&4
        etcdctl $AUTH_OPTIONS member update "$(cat "$ETCD_DATA_DIR/member_id")" --peer-urls="http://${HOSTNAME}.etcd-headless.api-gateway.svc.cluster.local:2380" 1>&3 2>&4
    fi
fi
exec etcd 1>&3 2>&4

tokers commented 3 years ago

@ziyou434 Could you provides the options that used for etcd start.

I use bitnami/etcd chart ,and --set auth.rbac.enabled=false. The chart use setup.sh to start etcd

setup.sh

#!/bin/bash

set -o errexit
set -o pipefail
set -o nounset

# Debug section
exec 3>&1
exec 4>&2

if [[ "${BITNAMI_DEBUG:-false}" = true ]]; then
    echo "==> Bash debug is on"
else
    echo "==> Bash debug is off"
    exec 1>/dev/null
    exec 2>/dev/null
fi

# Constants
HOSTNAME="$(hostname -s)"
AUTH_OPTIONS=""
export ETCDCTL_ENDPOINTS="etcd-0.etcd-headless.api-gateway.svc.cluster.local:2380"
export ROOT_PASSWORD="${ETCD_ROOT_PASSWORD:-}"
if [[ -n "${ETCD_ROOT_PASSWORD:-}" ]]; then
  unset ETCD_ROOT_PASSWORD
fi
# Functions
## Store member id for later member replacement
store_member_id() {
    while ! etcdctl $AUTH_OPTIONS member list; do sleep 1; done
    etcdctl $AUTH_OPTIONS member list | grep -w "$HOSTNAME" | awk '{ print $1}' | awk -F "," '{ print $1}' > "$ETCD_DATA_DIR/member_id"
    echo "==> Stored member id: $(cat ${ETCD_DATA_DIR}/member_id)" 1>&3 2>&4
    exit 0
}
## Configure RBAC
configure_rbac() {
    # When there's more than one replica, we can assume the 1st member
    # to be created is "etcd-0" since a statefulset is used
    if [[ -n "${ROOT_PASSWORD:-}" ]] && [[ "$HOSTNAME" == "etcd-0" ]]; then
        echo "==> Configuring RBAC authentication!" 1>&3 2>&4
        etcd &
        ETCD_PID=$!
        while ! etcdctl $AUTH_OPTIONS member list; do sleep 1; done
        echo "$ROOT_PASSWORD" | etcdctl $AUTH_OPTIONS user add root --interactive=false
        etcdctl $AUTH_OPTIONS auth enable
        kill "$ETCD_PID"
        sleep 5
    fi
}
## Checks whether there was a disaster or not
is_disastrous_failure() {
    local endpoints_array=(${ETCDCTL_ENDPOINTS//,/ })
    local active_endpoints=0
    local -r min_endpoints=$(((1 + 1)/2))

    for e in "${endpoints_array[@]}"; do
        if [[ "$e" != "$ETCD_ADVERTISE_CLIENT_URLS" ]] && (unset -v ETCDCTL_ENDPOINTS; etcdctl $AUTH_OPTIONS  endpoint health --endpoints="$e"); then
            active_endpoints=$((active_endpoints + 1))
        fi
    done
    if [[ $active_endpoints -lt $min_endpoints ]]; then
        true
    else
        false
    fi
}

## Check wether the member was succesfully removed from the cluster
should_add_new_member() {
    return_value=0
    if (grep -E "^Member[[:space:]]+[a-z0-9]+\s+removed\s+from\s+cluster\s+[a-z0-9]+$" "$(dirname "$ETCD_DATA_DIR")/member_removal.log") || \
       ! ([[ -d "$ETCD_DATA_DIR/member/snap" ]] && [[ -f "$ETCD_DATA_DIR/member_id" ]]); then
        rm -rf $ETCD_DATA_DIR/* 1>&3 2>&4
    else
        return_value=1
    fi
    rm -f "$(dirname "$ETCD_DATA_DIR")/member_removal.log" 1>&3 2>&4
    return $return_value
}

if [[ ! -d "$ETCD_DATA_DIR" ]]; then
    echo "==> Creating data dir..." 1>&3 2>&4
    echo "==> There is no data at all. Initializing a new member of the cluster..." 1>&3 2>&4
    store_member_id & 1>&3 2>&4
    configure_rbac
else
    echo "==> Detected data from previous deployments..." 1>&3 2>&4
    if [[ $(stat -c "%a" "$ETCD_DATA_DIR") != *700 ]]; then
        echo "==> Setting data directory permissions to 700 in a recursive way (required in etcd >=3.4.10)" 1>&3 2>&4
        chmod -R 700 $ETCD_DATA_DIR
    else
        echo "==> The data directory is already configured with the proper permissions" 1>&3 2>&4
    fi
    if [[ 1 -eq 1 ]]; then
        echo "==> Single node cluster detected!!" 1>&3 2>&4
    elif is_disastrous_failure; then
        echo "==> Cluster not responding!!" 1>&3 2>&4
        echo "==> Disaster recovery is disabled, the cluster will try to recover on it's own..." 1>&3 2>&4
    elif should_add_new_member; then
        echo "==> Adding new member to existing cluster..." 1>&3 2>&4
        etcdctl $AUTH_OPTIONS member add "$HOSTNAME" --peer-urls="http://${HOSTNAME}.etcd-headless.api-gateway.svc.cluster.local:2380" | grep "^ETCD_" > "$ETCD_DATA_DIR/new_member_envs"
        sed -ie "s/^/export /" "$ETCD_DATA_DIR/new_member_envs"
        echo "==> Loading env vars of existing cluster..." 1>&3 2>&4
        source "$ETCD_DATA_DIR/new_member_envs" 1>&3 2>&4
        store_member_id & 1>&3 2>&4
    else
        echo "==> Updating member in existing cluster..." 1>&3 2>&4
        etcdctl $AUTH_OPTIONS member update "$(cat "$ETCD_DATA_DIR/member_id")" --peer-urls="http://${HOSTNAME}.etcd-headless.api-gateway.svc.cluster.local:2380" 1>&3 2>&4
    fi
fi
exec etcd 1>&3 2>&4

The start up script seems normal, could you paste some etcd logs?

ziyou434 commented 3 years ago

@ziyou434 Could you provides the options that used for etcd start.

I use bitnami/etcd chart ,and --set auth.rbac.enabled=false. The chart use setup.sh to start etcd setup.sh

#!/bin/bash

set -o errexit
set -o pipefail
set -o nounset

# Debug section
exec 3>&1
exec 4>&2

if [[ "${BITNAMI_DEBUG:-false}" = true ]]; then
    echo "==> Bash debug is on"
else
    echo "==> Bash debug is off"
    exec 1>/dev/null
    exec 2>/dev/null
fi

# Constants
HOSTNAME="$(hostname -s)"
AUTH_OPTIONS=""
export ETCDCTL_ENDPOINTS="etcd-0.etcd-headless.api-gateway.svc.cluster.local:2380"
export ROOT_PASSWORD="${ETCD_ROOT_PASSWORD:-}"
if [[ -n "${ETCD_ROOT_PASSWORD:-}" ]]; then
  unset ETCD_ROOT_PASSWORD
fi
# Functions
## Store member id for later member replacement
store_member_id() {
    while ! etcdctl $AUTH_OPTIONS member list; do sleep 1; done
    etcdctl $AUTH_OPTIONS member list | grep -w "$HOSTNAME" | awk '{ print $1}' | awk -F "," '{ print $1}' > "$ETCD_DATA_DIR/member_id"
    echo "==> Stored member id: $(cat ${ETCD_DATA_DIR}/member_id)" 1>&3 2>&4
    exit 0
}
## Configure RBAC
configure_rbac() {
    # When there's more than one replica, we can assume the 1st member
    # to be created is "etcd-0" since a statefulset is used
    if [[ -n "${ROOT_PASSWORD:-}" ]] && [[ "$HOSTNAME" == "etcd-0" ]]; then
        echo "==> Configuring RBAC authentication!" 1>&3 2>&4
        etcd &
        ETCD_PID=$!
        while ! etcdctl $AUTH_OPTIONS member list; do sleep 1; done
        echo "$ROOT_PASSWORD" | etcdctl $AUTH_OPTIONS user add root --interactive=false
        etcdctl $AUTH_OPTIONS auth enable
        kill "$ETCD_PID"
        sleep 5
    fi
}
## Checks whether there was a disaster or not
is_disastrous_failure() {
    local endpoints_array=(${ETCDCTL_ENDPOINTS//,/ })
    local active_endpoints=0
    local -r min_endpoints=$(((1 + 1)/2))

    for e in "${endpoints_array[@]}"; do
        if [[ "$e" != "$ETCD_ADVERTISE_CLIENT_URLS" ]] && (unset -v ETCDCTL_ENDPOINTS; etcdctl $AUTH_OPTIONS  endpoint health --endpoints="$e"); then
            active_endpoints=$((active_endpoints + 1))
        fi
    done
    if [[ $active_endpoints -lt $min_endpoints ]]; then
        true
    else
        false
    fi
}

## Check wether the member was succesfully removed from the cluster
should_add_new_member() {
    return_value=0
    if (grep -E "^Member[[:space:]]+[a-z0-9]+\s+removed\s+from\s+cluster\s+[a-z0-9]+$" "$(dirname "$ETCD_DATA_DIR")/member_removal.log") || \
       ! ([[ -d "$ETCD_DATA_DIR/member/snap" ]] && [[ -f "$ETCD_DATA_DIR/member_id" ]]); then
        rm -rf $ETCD_DATA_DIR/* 1>&3 2>&4
    else
        return_value=1
    fi
    rm -f "$(dirname "$ETCD_DATA_DIR")/member_removal.log" 1>&3 2>&4
    return $return_value
}

if [[ ! -d "$ETCD_DATA_DIR" ]]; then
    echo "==> Creating data dir..." 1>&3 2>&4
    echo "==> There is no data at all. Initializing a new member of the cluster..." 1>&3 2>&4
    store_member_id & 1>&3 2>&4
    configure_rbac
else
    echo "==> Detected data from previous deployments..." 1>&3 2>&4
    if [[ $(stat -c "%a" "$ETCD_DATA_DIR") != *700 ]]; then
        echo "==> Setting data directory permissions to 700 in a recursive way (required in etcd >=3.4.10)" 1>&3 2>&4
        chmod -R 700 $ETCD_DATA_DIR
    else
        echo "==> The data directory is already configured with the proper permissions" 1>&3 2>&4
    fi
    if [[ 1 -eq 1 ]]; then
        echo "==> Single node cluster detected!!" 1>&3 2>&4
    elif is_disastrous_failure; then
        echo "==> Cluster not responding!!" 1>&3 2>&4
        echo "==> Disaster recovery is disabled, the cluster will try to recover on it's own..." 1>&3 2>&4
    elif should_add_new_member; then
        echo "==> Adding new member to existing cluster..." 1>&3 2>&4
        etcdctl $AUTH_OPTIONS member add "$HOSTNAME" --peer-urls="http://${HOSTNAME}.etcd-headless.api-gateway.svc.cluster.local:2380" | grep "^ETCD_" > "$ETCD_DATA_DIR/new_member_envs"
        sed -ie "s/^/export /" "$ETCD_DATA_DIR/new_member_envs"
        echo "==> Loading env vars of existing cluster..." 1>&3 2>&4
        source "$ETCD_DATA_DIR/new_member_envs" 1>&3 2>&4
        store_member_id & 1>&3 2>&4
    else
        echo "==> Updating member in existing cluster..." 1>&3 2>&4
        etcdctl $AUTH_OPTIONS member update "$(cat "$ETCD_DATA_DIR/member_id")" --peer-urls="http://${HOSTNAME}.etcd-headless.api-gateway.svc.cluster.local:2380" 1>&3 2>&4
    fi
fi
exec etcd 1>&3 2>&4

The start up script seems normal, could you paste some etcd logs?

sure.

2020-11-12 03:02:10.270824 I | pkg/flags: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=http://etcd-0.etcd-headless.api-gateway.svc.cluster.local:2379 2020-11-12 03:02:10.270943 I | pkg/flags: recognized and used environment variable ETCD_DATA_DIR=/bitnami/etcd/data 2020-11-12 03:02:10.271003 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS=http://etcd-0.etcd-headless.api-gateway.svc.cluster.local:2380 2020-11-12 03:02:10.271030 I | pkg/flags: recognized and used environment variable ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379 2020-11-12 03:02:10.271049 I | pkg/flags: recognized and used environment variable ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380 2020-11-12 03:02:10.271073 I | pkg/flags: recognized and used environment variable ETCD_NAME=etcd-0 2020-11-12 03:02:10.271194 W | pkg/flags: unrecognized environment variable ETCD_SERVICE_HOST=172.20.199.58 2020-11-12 03:02:10.271210 W | pkg/flags: unrecognized environment variable ETCD_PORT_2380_TCP_ADDR=172.20.199.58 2020-11-12 03:02:10.271222 W | pkg/flags: unrecognized environment variable ETCD_PORT_2379_TCP=tcp://172.20.199.58:2379 2020-11-12 03:02:10.271241 W | pkg/flags: unrecognized environment variable ETCD_PORT_2380_TCP_PROTO=tcp 2020-11-12 03:02:10.271254 W | pkg/flags: unrecognized environment variable ETCD_PORT_2379_TCP_PORT=2379 2020-11-12 03:02:10.271262 W | pkg/flags: unrecognized environment variable ETCD_PORT_2379_TCP_ADDR=172.20.199.58 2020-11-12 03:02:10.271273 W | pkg/flags: unrecognized environment variable ETCD_PORT_2380_TCP_PORT=2380 2020-11-12 03:02:10.271287 W | pkg/flags: unrecognized environment variable ETCD_PORT_2380_TCP=tcp://172.20.199.58:2380 2020-11-12 03:02:10.271309 W | pkg/flags: unrecognized environment variable ETCD_SERVICE_PORT_CLIENT=2379 2020-11-12 03:02:10.271323 W | pkg/flags: unrecognized environment variable ETCD_SERVICE_PORT_PEER=2380 2020-11-12 03:02:10.271335 W | pkg/flags: unrecognized environment variable ETCD_PORT_2379_TCP_PROTO=tcp 2020-11-12 03:02:10.271350 W | pkg/flags: unrecognized environment variable ETCD_PORT=tcp://172.20.199.58:2379 2020-11-12 03:02:10.271361 W | pkg/flags: unrecognized environment variable ETCD_SERVICE_PORT=2379 [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead 2020-11-12 03:02:10.271400 I | etcdmain: etcd Version: 3.4.9 2020-11-12 03:02:10.271413 I | etcdmain: Git SHA: 54ba95891 2020-11-12 03:02:10.271423 I | etcdmain: Go Version: go1.12.17 2020-11-12 03:02:10.271429 I | etcdmain: Go OS/Arch: linux/amd64 2020-11-12 03:02:10.271439 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2 2020-11-12 03:02:10.271522 W | etcdmain: found invalid file/dir member_id under data dir /bitnami/etcd/data (Ignore this if you are upgrading etcd) 2020-11-12 03:02:10.271540 N | etcdmain: the server is already initialized as member before, starting as etcd member... [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead 2020-11-12 03:02:10.271813 I | embed: name = etcd-0 2020-11-12 03:02:10.271836 I | embed: data dir = /bitnami/etcd/data 2020-11-12 03:02:10.271847 I | embed: member dir = /bitnami/etcd/data/member 2020-11-12 03:02:10.271859 I | embed: heartbeat = 100ms 2020-11-12 03:02:10.271865 I | embed: election = 1000ms 2020-11-12 03:02:10.271881 I | embed: snapshot count = 100000 2020-11-12 03:02:10.271895 I | embed: advertise client URLs = http://etcd-0.etcd-headless.api-gateway.svc.cluster.local:2379 2020-11-12 03:02:10.271908 I | embed: initial advertise peer URLs = http://etcd-0.etcd-headless.api-gateway.svc.cluster.local:2380 2020-11-12 03:02:10.271920 I | embed: initial cluster = 2020-11-12 03:02:10.276156 I | etcdserver: restarting member 8ecb0b7cde5e4235 in cluster 2b0eb2956f410bc1 at commit index 107 raft2020/11/12 03:02:10 INFO: 8ecb0b7cde5e4235 switched to configuration voters=() raft2020/11/12 03:02:10 INFO: 8ecb0b7cde5e4235 became follower at term 4 raft2020/11/12 03:02:10 INFO: newRaft 8ecb0b7cde5e4235 [peers: [], term: 4, commit: 107, applied: 0, lastindex: 107, lastterm: 4] 2020-11-12 03:02:10.279984 W | auth: simple token is not cryptographically signed 2020-11-12 03:02:10.283770 I | etcdserver: starting server... [version: 3.4.9, cluster version: to_be_decided] raft2020/11/12 03:02:10 INFO: 8ecb0b7cde5e4235 switched to configuration voters=(10289330404592599605) 2020-11-12 03:02:10.284749 I | etcdserver/membership: added member 8ecb0b7cde5e4235 [http://etcd-0.etcd-headless.api-gateway.svc.cluster.local:2380] to cluster 2b0eb2956f410bc1 2020-11-12 03:02:10.285151 N | etcdserver/membership: set the initial cluster version to 3.4 2020-11-12 03:02:10.285318 I | etcdserver/api: enabled capabilities for version 3.4 2020-11-12 03:02:10.287927 I | embed: listening for peers on [::]:2380 raft2020/11/12 03:02:11 INFO: 8ecb0b7cde5e4235 is starting a new election at term 4 raft2020/11/12 03:02:11 INFO: 8ecb0b7cde5e4235 became candidate at term 5 raft2020/11/12 03:02:11 INFO: 8ecb0b7cde5e4235 received MsgVoteResp from 8ecb0b7cde5e4235 at term 5 raft2020/11/12 03:02:11 INFO: 8ecb0b7cde5e4235 became leader at term 5 raft2020/11/12 03:02:11 INFO: raft.node: 8ecb0b7cde5e4235 elected leader 8ecb0b7cde5e4235 at term 5 2020-11-12 03:02:11.477491 I | etcdserver: published {Name:etcd-0 ClientURLs:[http://etcd-0.etcd-headless.api-gateway.svc.cluster.local:2379]} to cluster 2b0eb2956f410bc1 2020-11-12 03:02:11.477626 I | embed: ready to serve client requests 2020-11-12 03:02:11.478714 N | embed: serving insecure client requests on [::]:2379, this is strongly discouraged!

souzens commented 3 years ago

@souzens

Thanks for feedback.

but it works fine on my env using etcd-3.4.13. could you please provide more details ? thanks.

@idbeta please help check. thanks

you deployed etcd in single mode or cluster mode?

i tested that apisix2.0 works ok in etcd single mode..

nohup /home/admin/etcd-v3.4.9-linux-amd64/etcd --data-dir /home/admin/etcd/data --listen-client-urls http://10.111.9.155:2379 --advertise-client-urls http://10.111.9.155:2379 >> /home/admin/etcd.log 2>&1 &

but in cluster mode will report errors

etcd.conf

name: etcd@10.111.9.155
data-dir: /home/admin/etcd/data
listen-peer-urls: http://10.111.9.155:2380
listen-client-urls: http://10.111.9.155:2379
advertise-client-urls: http://10.111.9.155:2379
listen-peer-urls: http://10.111.9.155:2380
initial-advertise-peer-urls: http://10.111.9.155:2380
initial-cluster-token: etcd-cluster-token
initial-cluster-state: new
initial-cluster: etcd@10.111.9.154=http://10.111.9.154:2380,etcd@10.111.21.245=http://10.111.21.245:2380,etcd@10.111.9.155=http://10.111.9.155:2380

nic-chen commented 3 years ago

@souzens Thanks for feedback. but it works fine on my env using etcd-3.4.13. could you please provide more details ? thanks. @idbeta please help check. thanks

you deployed etcd in single mode or cluster mode?

i tested that apisix2.0 works ok in etcd single mode..
nohup /home/admin/etcd-v3.4.9-linux-amd64/etcd --data-dir /home/admin/etcd/data --listen-client-urls http://10.111.9.155:2379 --advertise-client-urls http://10.111.9.155:2379 >> /home/admin/etcd.log 2>&1 &
but in cluster mode will report errors

etcd.conf
name: etcd@10.111.9.155
data-dir: /home/admin/etcd/data
listen-peer-urls: http://10.111.9.155:2380
listen-client-urls: http://10.111.9.155:2379
advertise-client-urls: http://10.111.9.155:2379
listen-peer-urls: http://10.111.9.155:2380
initial-advertise-peer-urls: http://10.111.9.155:2380
initial-cluster-token: etcd-cluster-token
initial-cluster-state: new
initial-cluster: etcd@10.111.9.154=http://10.111.9.154:2380,etcd@10.111.21.245=http://10.111.21.245:2380,etcd@10.111.9.155=http://10.111.9.155:2380

I tried ETCD cluster, It works fine too.

please confirm your ETCD cluster all ready, and all nodes work well.

yankunsam commented 3 years ago

by test just now, apisix2.0 will report error above when run wtih etcd-3.4.13 instead of etcd3.4.9 run ok

docker.io/bitnami/etcd:3.4.9-debian-10-r34? The same errors.

tokers commented 3 years ago

That's strange, Could you use tcpdump in your environment, let's capture some HTTP packets between APISIX and the etcd to see whether the body is abnormal.

yankunsam commented 3 years ago

how to tell apisix the password of etcd?

yankunsam commented 3 years ago

etcdserver: failed to apply request "header: put:<key:\"/apisix/proto/\" value_size:8 >" with response "" took (1.669µs) to execute, err is auth: user name is empty 2020-11-12 06:44:28.061760 W | etcdserver: failed to apply request "header: put:<key:\"/apisix/plugin_metadata/\" value_size:8 >" with response "" took (1.479µs) to execute, err is auth: user name is empty

tokers commented 3 years ago

how to tell apisix the password of etcd?

See https://github.com/apache/apisix/blob/master/conf/config-default.yaml for the details.

csh995426531 commented 3 years ago

me too. etcd version 3.4.13

ziyou434 commented 3 years ago

I find etcdctl get /apisix/services will return null，exist but null。/apisix/routes... is the same. I guess maybe errors occured in admin_init, but i haven't found.

tokers commented 3 years ago

That's strange, Could you use tcpdump in your environment, let's capture some HTTP packets between APISIX and the etcd to see whether the body is abnormal.

@ziyou434 Could you also provide the config.yaml of APISIX.

yankunsam commented 3 years ago

how to tell apisix the password of etcd?

See https://github.com/apache/apisix/blob/master/conf/config-default.yaml for the details.

As the guide document, I set the password/user of etcd, but the same error.

ziyou434 commented 3 years ago

That's strange, Could you use tcpdump in your environment, let's capture some HTTP packets between APISIX and the etcd to see whether the body is abnormal.

@ziyou434 Could you also provide the config.yaml of APISIX.

apisix:
  node_listen: 9080              # APISIX listening port
  enable_admin: true
  enable_admin_cors: true         # Admin API support CORS response headers.
  enable_debug: false
  enable_dev_mode: false          # Sets nginx worker_processes to 1 if set to true
  enable_reuseport: true          # Enable nginx SO_REUSEPORT switch if set to true.
  enable_ipv6: true
  config_center: etcd             # etcd: use etcd to store the config value
                                  # yaml: fetch the config value from local yaml file `/your_path/conf/apisix.yaml`

  #proxy_protocol:                 # Proxy Protocol configuration
  #  listen_http_port: 9181        # The port with proxy protocol for http, it differs from node_listen and port_admin.
                                   # This port can only receive http request with proxy protocol, but node_listen & port_admin
                                   # can only receive http request. If you enable proxy protocol, you must use this port to
                                   # receive http request with proxy protocol
  #  listen_https_port: 9182       # The port with proxy protocol for https
  #  enable_tcp_pp: true           # Enable the proxy protocol for tcp proxy, it works for stream_proxy.tcp option
  #  enable_tcp_pp_to_upstream: true # Enables the proxy protocol to the upstream server

  enable_server_tokens: true       # Whether the APISIX version number should be shown in Server header.
                                   # It's enabled by default.

  proxy_cache:                     # Proxy Caching configuration
    cache_ttl: 10s                 # The default caching time if the upstream does not specify the cache time
    zones:                         # The parameters of a cache
    - name: disk_cache_one         # The name of the cache, administrator can be specify
                                   # which cache to use by name in the admin api
      memory_size: 50m             # The size of shared memory, it's used to store the cache index
      disk_size: 1G                # The size of disk, it's used to store the cache data
      disk_path: "/tmp/disk_cache_one" # The path to store the cache data
      cache_levels: "1:2"           # The hierarchy levels of a cache
  #  - name: disk_cache_two
  #    memory_size: 50m
  #    disk_size: 1G
  #    disk_path: "/tmp/disk_cache_two"
  #    cache_levels: "1:2"

  allow_admin:                  # http://nginx.org/en/docs/http/ngx_http_access_module.html#allow
    - 127.0.0.0/24              # If we don't set any IP list, then any IP access is allowed by default.
  #   - "::/64"
  # port_admin: 9180              # use a separate port
  # https_admin: true             # enable HTTPS when use a separate port for Admin API.
                                # Admin API will use conf/apisix_admin_api.crt and conf/apisix_admin_api.key as certificate.
  admin_api_mtls:               # Depends on `port_admin` and `https_admin`.
    admin_ssl_cert: ""             # Path of your self-signed server side cert.
    admin_ssl_cert_key: ""         # Path of your self-signed server side key.
    admin_ssl_ca_cert: ""          # Path of your self-signed ca cert.The CA is used to sign all admin api callers' certificates.

  # Default token when use API to call for Admin API.
  # *NOTE*: Highly recommended to modify this value to protect APISIX's Admin API.
  # Disabling this configuration item means that the Admin API does not
  # require any authentication.
  admin_key:
    -
      name: "admin"
      key: edd1c9f034335f136f87ad84b625c8f1
      role: admin                 # admin: manage all configuration data
                                  # viewer: only can view configuration data
    -
      name: "viewer"
      key: 4054f7cf07e344346cd3f287985e76a2
      role: viewer

  delete_uri_tail_slash: false    # delete the '/' at the end of the URI
  router:
    http: 'radixtree_uri'         # radixtree_uri: match route by uri(base on radixtree)
                                  # radixtree_host_uri: match route by host + uri(base on radixtree)
    ssl: 'radixtree_sni'          # radixtree_sni: match route by SNI(base on radixtree)
  # stream_proxy:                 # TCP/UDP proxy
  #   tcp:                        # TCP proxy port list
  #     - 9100
  #     - 9101
  #   udp:                        # UDP proxy port list
  #     - 9200
  #     - 9211
  dns_resolver:                   # If not set, read from `/etc/resolv.conf`
    - 172.20.0.10
  #  - 1.1.1.1
  #  - 8.8.8.8
  dns_resolver_valid: 30          # valid time for dns result 30 seconds
  resolver_timeout: 5             # resolver timeout
  ssl:
    enable: true
    enable_http2: true
    listen_port: 9443
    # ssl_trusted_certificate: /path/to/ca-cert # Specifies a file path with trusted CA certificates in the PEM format
                                                # used to verify the certificate when APISIX needs to do SSL/TLS handshaking
                                                # with external services (e.g. etcd)
    ssl_protocols: "TLSv1.2 TLSv1.3"
    ssl_ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384"
    ssl_session_tickets: false              #  disable ssl_session_tickets by default for 'ssl_session_tickets' would make Perfect Forward Secrecy useless.
                                            #  ref: https://github.com/mozilla/server-side-tls/issues/135
    key_encrypt_salt: "edd1c9f0985e76a2"    #  If not set, will save origin ssl key into etcd.
                                            #  If set this, must be a string of length 16. And it will encrypt ssl key with AES-128-CBC
                                            #  !!! So do not change it after saving your ssl, it can't decrypt the ssl keys have be saved if you change !!
nginx_config:                     # config for render the template to genarate nginx.conf
  error_log: "/dev/stdout"
  error_log_level: "warn"         # warn,error
  worker_processes: auto          # one worker will get best performance, you can use "auto", but remember it is just work well only on physical machine
                                  # no more than 8 workers, otherwise competition between workers will consume a lot of resources
                                  # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
  enable_cpu_affinity: true       # enbale cpu affinity, this is just work well only on physical machine
  worker_rlimit_nofile: 20480     # the number of files a worker process can open, should be larger than worker_connections
  worker_shutdown_timeout: 240s     # timeout for a graceful shutdown of worker processes
  event:
    worker_connections: 10620
  #envs:                            # allow to get a list of environment variables
  #  - TEST_ENV
  http:
    access_log: "/dev/stdout"
    access_log_format: "$remote_addr - $remote_user [$time_local] $http_host \"$request\" $status $body_bytes_sent $request_time \"$http_referer\" \"$http_user_agent\" $upstream_addr $upstream_status $upstream_response_time"
    access_log_format_escape: default       # allows setting json or default characters escaping in variables
    keepalive_timeout: 60s         # timeout during which a keep-alive client connection will stay open on the server side.
    client_header_timeout: 60s     # timeout for reading client request header, then 408 (Request Time-out) error is returned to the client
    client_body_timeout: 60s       # timeout for reading client request body, then 408 (Request Time-out) error is returned to the client
    client_max_body_size: 0        # The maximum allowed size of the client request body.
                                   # If exceeded, the 413 (Request Entity Too Large) error is returned to the client.
                                   # Note that unlike Nginx, we don't limit the body size by default.

    send_timeout: 10s              # timeout for transmitting a response to the client.then the connection is closed
    underscores_in_headers: "on"   # default enables the use of underscores in client request header fields
    real_ip_header: "X-Real-IP"    # http://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_header
    real_ip_from:                  # http://nginx.org/en/docs/http/ngx_http_realip_module.html#set_real_ip_from
      - 127.0.0.1
      - 'unix:'
    #lua_shared_dicts:              # add custom shared cache to nginx.conf
    #  ipc_shared_dict: 100m        # custom shared cache, format: `cache-key: cache-size`

etcd:
  host:                           # it's possible to define multiple etcd hosts addresses of the same etcd cluster.
    - "http://etcd-headless.api-gateway.svc.cluster.local:2379"     # multiple etcd address, if your etcd cluster enables TLS, please use https scheme,
                                  # e.g. "https://127.0.0.1:2379".
  prefix: "/apisix"               # apisix configurations prefix
  timeout: 30                     # 30 seconds
  #user: root                      # root username for etcd
  #password: 2xJIUecrUa            # root password for etcd
  tls:
      verify: true                # whether to verify the etcd endpoint certificate when setup a TLS connection to etcd,
                                  # the default value is true, e.g. the certificate will be verified strictly.

# discovery:                          # service discovery center
#   eureka:
#     host:                           # it's possible to define multiple eureka hosts addresses of the same eureka cluster.
#       - "http://127.0.0.1:8761"
#     prefix: "/eureka/"
#     fetch_interval: 30              # default 30s
#     weight: 100                     # default weight for node
#     timeout:
#       connect: 2000                 # default 2000ms
#       send: 2000                    # default 2000ms
#       read: 5000                    # default 5000ms

plugins:                          # plugin list
  #- example-plugin
  - limit-req
  - limit-count
  - limit-conn
  - key-auth
  - basic-auth
  - prometheus
  - node-status
  - jwt-auth
  - zipkin
  - ip-restriction
  - referer-restriction
  - grpc-transcode
  - serverless-pre-function
  - serverless-post-function
  - openid-connect
  - proxy-rewrite
  - redirect
  - response-rewrite
  - fault-injection
  - udp-logger
  - wolf-rbac
  - tcp-logger
  - kafka-logger
  - cors
  - consumer-restriction
  - syslog
  - batch-requests
  - http-logger
  #- skywalking
  - echo
  - authz-keycloak
  - uri-blocker
  - request-validation
  - proxy-cache
  - proxy-mirror
  - request-id
  - hmac-auth
  - api-breaker

stream_plugins:
  - mqtt-proxy

plugin_attr:
  log-rotate:
    interval: 3600    # rotate interval (unit: second)
    max_kept: 168     # max number of log files will be kept
  skywalking:
    service_name: APISIX
    service_instance_name: "APISIX Instance Name"
    endpoint_addr: http://127.0.0.1:12800

souzens commented 3 years ago

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# PLEASE DO NOT UPDATE THIS FILE!
# If you want to set the specified configuration value, you can set the new
# value in the conf/config.yaml file.
#

apisix:
  node_listen: 9080              # APISIX listening port
  enable_admin: true
  enable_admin_cors: true         # Admin API support CORS response headers.
  enable_debug: true
  enable_dev_mode: false          # Sets nginx worker_processes to 1 if set to true
  enable_reuseport: true          # Enable nginx SO_REUSEPORT switch if set to true.
  enable_ipv6: true
  config_center: etcd             # etcd: use etcd to store the config value
                                  # yaml: fetch the config value from local yaml file `/your_path/conf/apisix.yaml`

  #proxy_protocol:                 # Proxy Protocol configuration
  #  listen_http_port: 9181        # The port with proxy protocol for http, it differs from node_listen and port_admin.
                                   # This port can only receive http request with proxy protocol, but node_listen & port_admin
                                   # can only receive http request. If you enable proxy protocol, you must use this port to
                                   # receive http request with proxy protocol
  #  listen_https_port: 9182       # The port with proxy protocol for https
  #  enable_tcp_pp: true           # Enable the proxy protocol for tcp proxy, it works for stream_proxy.tcp option
  #  enable_tcp_pp_to_upstream: true # Enables the proxy protocol to the upstream server

  proxy_cache:                     # Proxy Caching configuration
    cache_ttl: 10s                 # The default caching time if the upstream does not specify the cache time
    zones:                         # The parameters of a cache
    - name: disk_cache_one         # The name of the cache, administrator can be specify
                                   # which cache to use by name in the admin api
      memory_size: 50m             # The size of shared memory, it's used to store the cache index
      disk_size: 1G                # The size of disk, it's used to store the cache data
      disk_path: "/tmp/disk_cache_one" # The path to store the cache data
      cache_levels: "1:2"           # The hierarchy levels of a cache
  #  - name: disk_cache_two
  #    memory_size: 50m
  #    disk_size: 1G
  #    disk_path: "/tmp/disk_cache_two"
  #    cache_levels: "1:2"

  # allow_admin:                  # http://nginx.org/en/docs/http/ngx_http_access_module.html#allow
  #  - 127.0.0.0/24              # If we don't set any IP list, then any IP access is allowed by default.
  #   - "::/64"
  # port_admin: 9180              # use a separate port
  # https_admin: true             # enable HTTPS when use a separate port for Admin API.
                                # Admin API will use conf/apisix_admin_api.crt and conf/apisix_admin_api.key as certificate.
  admin_api_mtls:               # Depends on `port_admin` and `https_admin`.
    admin_ssl_cert: ""             # Path of your self-signed server side cert.
    admin_ssl_cert_key: ""         # Path of your self-signed server side key.
    admin_ssl_ca_cert: ""          # Path of your self-signed ca cert.The CA is used to sign all admin api callers' certificates.

  # Default token when use API to call for Admin API.
  # *NOTE*: Highly recommended to modify this value to protect APISIX's Admin API.
  # Disabling this configuration item means that the Admin API does not
  # require any authentication.
  admin_key:
    -
      name: "admin"
      key: edd1c9f034335f136f87ad84b625c8f1
      role: admin                 # admin: manage all configuration data
                                  # viewer: only can view configuration data
    -
      name: "viewer"
      key: 4054f7cf07e344346cd3f287985e76a2
      role: viewer

  delete_uri_tail_slash: false    # delete the '/' at the end of the URI
  router:
    http: 'radixtree_uri'         # radixtree_uri: match route by uri(base on radixtree)
                                  # radixtree_host_uri: match route by host + uri(base on radixtree)
    ssl: 'radixtree_sni'          # radixtree_sni: match route by SNI(base on radixtree)
  # stream_proxy:                 # TCP/UDP proxy
  #   tcp:                        # TCP proxy port list
  #     - 9100
  #     - 9101
  #   udp:                        # UDP proxy port list
  #     - 9200
  #     - 9211
  # dns_resolver:                   # If not set, read from `/etc/resolv.conf`
  #  - 1.1.1.1
  #  - 8.8.8.8
  dns_resolver_valid: 30          # valid time for dns result 30 seconds
  resolver_timeout: 5             # resolver timeout
  ssl:
    enable: true
    enable_http2: true
    listen_port: 9443
    # ssl_trusted_certificate: /path/to/ca-cert # Specifies a file path with trusted CA certificates in the PEM format
                                                # used to verify the certificate when APISIX needs to do SSL/TLS handshaking
                                                # with external services (e.g. etcd)
    ssl_protocols: "TLSv1.2 TLSv1.3"
    ssl_ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384"
    key_encrypt_salt: "edd1c9f0985e76a2"    #  If not set, will save origin ssl key into etcd.
                                            #  If set this, must be a string of length 16. And it will encrypt ssl key with AES-128-CBC
                                            #  !!! So do not change it after saving your ssl, it can't decrypt the ssl keys have be saved if you change !!
#  discovery: eureka               # service discovery center
nginx_config:                     # config for render the template to genarate nginx.conf
  error_log: "logs/error.log"
  error_log_level: "warn"         # warn,error
  worker_processes: auto          # one worker will get best performance, you can use "auto", but remember it is just work well only on physical machine
                                  # no more than 8 workers, otherwise competition between workers will consume a lot of resources
                                  # if you want use multiple cores in container, you can inject the number of cpu as environment variable "APISIX_WORKER_PROCESSES"
  enable_cpu_affinity: true       # enbale cpu affinity, this is just work well only on physical machine
  worker_rlimit_nofile: 20480     # the number of files a worker process can open, should be larger than worker_connections
  worker_shutdown_timeout: 240s     # timeout for a graceful shutdown of worker processes
  event:
    worker_connections: 10620
  #envs:                            # allow to get a list of environment variables
  #  - TEST_ENV
  http:
    access_log: "logs/access.log"
    access_log_format: "$remote_addr - $remote_user [$time_local] $http_host \"$request\" $status $body_bytes_sent $request_time \"$http_referer\" \"$http_user_agent\" $upstream_addr $upstream_status $upstream_response_time"
    access_log_format_escape: default       # allows setting json or default characters escaping in variables
    keepalive_timeout: 60s         # timeout during which a keep-alive client connection will stay open on the server side.
    client_header_timeout: 60s     # timeout for reading client request header, then 408 (Request Time-out) error is returned to the client
    client_body_timeout: 60s       # timeout for reading client request body, then 408 (Request Time-out) error is returned to the client
    client_max_body_size: 0        # The maximum allowed size of the client request body.
                                   # If exceeded, the 413 (Request Entity Too Large) error is returned to the client.
                                   # Note that unlike Nginx, we don't limit the body size by default.

    send_timeout: 10s              # timeout for transmitting a response to the client.then the connection is closed
    underscores_in_headers: "on"   # default enables the use of underscores in client request header fields
    real_ip_header: "X-Real-IP"    # http://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_header
    real_ip_from:                  # http://nginx.org/en/docs/http/ngx_http_realip_module.html#set_real_ip_from
      - 10.111.0.0/16
      - 'unix:'
    #lua_shared_dicts:              # add custom shared cache to nginx.conf
    #  ipc_shared_dict: 100m        # custom shared cache, format: `cache-key: cache-size`

etcd:
  host:                           # it's possible to define multiple etcd hosts addresses of the same etcd cluster.
    - "http://10.111.9.155:2379"
  prefix: "/apisix"               # apisix configurations prefix
  timeout: 30                     # 30 seconds
  # user: root                     # root username for etcd
  # password: 5tHkHhYkjr6cQY        # root password for etcd
#eureka:
#  host:                           # it's possible to define multiple eureka hosts addresses of the same eureka cluster.
#    - "http://127.0.0.1:8761"
#  prefix: "/eureka/"
#  fetch_interval: 30              # default 30s
#  weight: 100                     # default weight for node
#  timeout:
#    connect: 2000                 # default 2000ms
#    send: 2000                    # default 2000ms
#    read: 5000                    # default 5000ms

plugins:                          # plugin list
  - example-plugin
  - limit-req
  - limit-count
  - limit-conn
  - key-auth
  - basic-auth
  - prometheus
  - node-status
  - jwt-auth
  - zipkin
  - ip-restriction
  - referer-restriction
  - grpc-transcode
  - serverless-pre-function
  - serverless-post-function
  - openid-connect
  - proxy-rewrite
  - redirect
  - response-rewrite
  - fault-injection
  - udp-logger
  - wolf-rbac
  - tcp-logger
  - kafka-logger
  - cors
  - consumer-restriction
  - syslog
  - batch-requests
  - http-logger
  - echo
  - authz-keycloak
  - uri-blocker
  - request-validation
  - proxy-cache
  - proxy-mirror
  - request-id
  - hmac-auth
  - api-breaker

stream_plugins:
  - mqtt-proxy

plugin_attr:
  log-rotate:
    interval: 3600    # rotate interval (unit: second)
    max_kept: 168     # max number of log files will be kept

ziyou434 commented 3 years ago

That's strange, Could you use tcpdump in your environment, let's capture some HTTP packets between APISIX and the etcd to see whether the body is abnormal.

@ziyou434 Could you also provide the config.yaml of APISIX.

I find apisix use method “set” in core.etcd.lua ，but etcd only has method “etcdctl put” ,Is this a problem?

tokers commented 3 years ago

That's strange, Could you use tcpdump in your environment, let's capture some HTTP packets between APISIX and the etcd to see whether the body is abnormal.

@ziyou434 Could you also provide the config.yaml of APISIX.

I find apisix use method “set” in core.etcd.lua ，but etcd only has method “etcdctl put” ,Is this a problem?

Well, that's just a naming, both of them will be translated into HTTP PUT request.

tokers commented 3 years ago

@ziyou434 @souzens @yankunsam So all of you deploy APISIX and etcd in Kubernetes?

ziyou434 commented 3 years ago

@ ziyou434 @souzens @yankunsam那么，你们所有人都在Kubernetes中部署APISIX和etcd吗？

yes，required by my company.

tokers commented 3 years ago

OK, i will try to reproduce it in my minikube.

idbeta commented 3 years ago

@souzens

Thanks for feedback.

but it works fine on my env using etcd-3.4.13. could you please provide more details ? thanks.

@idbeta please help check. thanks

My environment cannot reproduce this problem temporarily.

souzens commented 3 years ago

@ziyou434 @souzens @yankunsam So all of you deploy APISIX and etcd in Kubernetes?

yes.. apisix in Openshift at the beginning , etcd is deployed in Openshift ,but it takes many errors so i tried to deploy etcd in VM .

No matter in k8s or vm ,etcd is running ok , because apisix-dashboard can write and read data nomallly ,and use etcdctl command also ok

ziyou434 commented 3 years ago

@ ziyou434 @souzens @yankunsam那么，你们所有人都在Kubernetes中部署APISIX和etcd吗？

是的.. apisix在Openshift 开头，etcd部署在Openshift中，但是会遇到很多错误，因此我尝试在VM中部署etcd。

无论在k8s还是vm中，etcd都可以正常运行，因为apisix-dashboard可以正常读写数据，并且使用etcdctl命令也可以

I use apisix at rancher

ziyou434 commented 3 years ago

@ziyou434 @souzens @yankunsam So all of you deploy APISIX and etcd in Kubernetes?

yes.. apisix in Openshift at the beginning , etcd is deployed in Openshift ,but it takes many errors so i tried to deploy etcd in VM .

No matter in k8s or vm ,etcd is running ok , because apisix-dashboard can write and read data nomallly ,and use etcdctl command also ok

In fact, there is no problem with etcd, but the problem with apisix log

tokers commented 3 years ago

@ziyou434 Could you provides the options that used for etcd start.

I use bitnami/etcd chart ,and --set auth.rbac.enabled=false. The chart use setup.sh to start etcd

setup.sh

#!/bin/bash

set -o errexit
set -o pipefail
set -o nounset

# Debug section
exec 3>&1
exec 4>&2

if [[ "${BITNAMI_DEBUG:-false}" = true ]]; then
    echo "==> Bash debug is on"
else
    echo "==> Bash debug is off"
    exec 1>/dev/null
    exec 2>/dev/null
fi

# Constants
HOSTNAME="$(hostname -s)"
AUTH_OPTIONS=""
export ETCDCTL_ENDPOINTS="etcd-0.etcd-headless.api-gateway.svc.cluster.local:2380"
export ROOT_PASSWORD="${ETCD_ROOT_PASSWORD:-}"
if [[ -n "${ETCD_ROOT_PASSWORD:-}" ]]; then
  unset ETCD_ROOT_PASSWORD
fi
# Functions
## Store member id for later member replacement
store_member_id() {
    while ! etcdctl $AUTH_OPTIONS member list; do sleep 1; done
    etcdctl $AUTH_OPTIONS member list | grep -w "$HOSTNAME" | awk '{ print $1}' | awk -F "," '{ print $1}' > "$ETCD_DATA_DIR/member_id"
    echo "==> Stored member id: $(cat ${ETCD_DATA_DIR}/member_id)" 1>&3 2>&4
    exit 0
}
## Configure RBAC
configure_rbac() {
    # When there's more than one replica, we can assume the 1st member
    # to be created is "etcd-0" since a statefulset is used
    if [[ -n "${ROOT_PASSWORD:-}" ]] && [[ "$HOSTNAME" == "etcd-0" ]]; then
        echo "==> Configuring RBAC authentication!" 1>&3 2>&4
        etcd &
        ETCD_PID=$!
        while ! etcdctl $AUTH_OPTIONS member list; do sleep 1; done
        echo "$ROOT_PASSWORD" | etcdctl $AUTH_OPTIONS user add root --interactive=false
        etcdctl $AUTH_OPTIONS auth enable
        kill "$ETCD_PID"
        sleep 5
    fi
}
## Checks whether there was a disaster or not
is_disastrous_failure() {
    local endpoints_array=(${ETCDCTL_ENDPOINTS//,/ })
    local active_endpoints=0
    local -r min_endpoints=$(((1 + 1)/2))

    for e in "${endpoints_array[@]}"; do
        if [[ "$e" != "$ETCD_ADVERTISE_CLIENT_URLS" ]] && (unset -v ETCDCTL_ENDPOINTS; etcdctl $AUTH_OPTIONS  endpoint health --endpoints="$e"); then
            active_endpoints=$((active_endpoints + 1))
        fi
    done
    if [[ $active_endpoints -lt $min_endpoints ]]; then
        true
    else
        false
    fi
}

## Check wether the member was succesfully removed from the cluster
should_add_new_member() {
    return_value=0
    if (grep -E "^Member[[:space:]]+[a-z0-9]+\s+removed\s+from\s+cluster\s+[a-z0-9]+$" "$(dirname "$ETCD_DATA_DIR")/member_removal.log") || \
       ! ([[ -d "$ETCD_DATA_DIR/member/snap" ]] && [[ -f "$ETCD_DATA_DIR/member_id" ]]); then
        rm -rf $ETCD_DATA_DIR/* 1>&3 2>&4
    else
        return_value=1
    fi
    rm -f "$(dirname "$ETCD_DATA_DIR")/member_removal.log" 1>&3 2>&4
    return $return_value
}

if [[ ! -d "$ETCD_DATA_DIR" ]]; then
    echo "==> Creating data dir..." 1>&3 2>&4
    echo "==> There is no data at all. Initializing a new member of the cluster..." 1>&3 2>&4
    store_member_id & 1>&3 2>&4
    configure_rbac
else
    echo "==> Detected data from previous deployments..." 1>&3 2>&4
    if [[ $(stat -c "%a" "$ETCD_DATA_DIR") != *700 ]]; then
        echo "==> Setting data directory permissions to 700 in a recursive way (required in etcd >=3.4.10)" 1>&3 2>&4
        chmod -R 700 $ETCD_DATA_DIR
    else
        echo "==> The data directory is already configured with the proper permissions" 1>&3 2>&4
    fi
    if [[ 1 -eq 1 ]]; then
        echo "==> Single node cluster detected!!" 1>&3 2>&4
    elif is_disastrous_failure; then
        echo "==> Cluster not responding!!" 1>&3 2>&4
        echo "==> Disaster recovery is disabled, the cluster will try to recover on it's own..." 1>&3 2>&4
    elif should_add_new_member; then
        echo "==> Adding new member to existing cluster..." 1>&3 2>&4
        etcdctl $AUTH_OPTIONS member add "$HOSTNAME" --peer-urls="http://${HOSTNAME}.etcd-headless.api-gateway.svc.cluster.local:2380" | grep "^ETCD_" > "$ETCD_DATA_DIR/new_member_envs"
        sed -ie "s/^/export /" "$ETCD_DATA_DIR/new_member_envs"
        echo "==> Loading env vars of existing cluster..." 1>&3 2>&4
        source "$ETCD_DATA_DIR/new_member_envs" 1>&3 2>&4
        store_member_id & 1>&3 2>&4
    else
        echo "==> Updating member in existing cluster..." 1>&3 2>&4
        etcdctl $AUTH_OPTIONS member update "$(cat "$ETCD_DATA_DIR/member_id")" --peer-urls="http://${HOSTNAME}.etcd-headless.api-gateway.svc.cluster.local:2380" 1>&3 2>&4
    fi
fi
exec etcd 1>&3 2>&4

Could you login into the etcd container and execute ps aux | grep etcd to see it's options?

ziyou434 commented 3 years ago

@ziyou434 Could you provides the options that used for etcd start.

I use bitnami/etcd chart ,and --set auth.rbac.enabled=false. The chart use setup.sh to start etcd setup.sh

#!/bin/bash

set -o errexit
set -o pipefail
set -o nounset

# Debug section
exec 3>&1
exec 4>&2

if [[ "${BITNAMI_DEBUG:-false}" = true ]]; then
    echo "==> Bash debug is on"
else
    echo "==> Bash debug is off"
    exec 1>/dev/null
    exec 2>/dev/null
fi

# Constants
HOSTNAME="$(hostname -s)"
AUTH_OPTIONS=""
export ETCDCTL_ENDPOINTS="etcd-0.etcd-headless.api-gateway.svc.cluster.local:2380"
export ROOT_PASSWORD="${ETCD_ROOT_PASSWORD:-}"
if [[ -n "${ETCD_ROOT_PASSWORD:-}" ]]; then
  unset ETCD_ROOT_PASSWORD
fi
# Functions
## Store member id for later member replacement
store_member_id() {
    while ! etcdctl $AUTH_OPTIONS member list; do sleep 1; done
    etcdctl $AUTH_OPTIONS member list | grep -w "$HOSTNAME" | awk '{ print $1}' | awk -F "," '{ print $1}' > "$ETCD_DATA_DIR/member_id"
    echo "==> Stored member id: $(cat ${ETCD_DATA_DIR}/member_id)" 1>&3 2>&4
    exit 0
}
## Configure RBAC
configure_rbac() {
    # When there's more than one replica, we can assume the 1st member
    # to be created is "etcd-0" since a statefulset is used
    if [[ -n "${ROOT_PASSWORD:-}" ]] && [[ "$HOSTNAME" == "etcd-0" ]]; then
        echo "==> Configuring RBAC authentication!" 1>&3 2>&4
        etcd &
        ETCD_PID=$!
        while ! etcdctl $AUTH_OPTIONS member list; do sleep 1; done
        echo "$ROOT_PASSWORD" | etcdctl $AUTH_OPTIONS user add root --interactive=false
        etcdctl $AUTH_OPTIONS auth enable
        kill "$ETCD_PID"
        sleep 5
    fi
}
## Checks whether there was a disaster or not
is_disastrous_failure() {
    local endpoints_array=(${ETCDCTL_ENDPOINTS//,/ })
    local active_endpoints=0
    local -r min_endpoints=$(((1 + 1)/2))

    for e in "${endpoints_array[@]}"; do
        if [[ "$e" != "$ETCD_ADVERTISE_CLIENT_URLS" ]] && (unset -v ETCDCTL_ENDPOINTS; etcdctl $AUTH_OPTIONS  endpoint health --endpoints="$e"); then
            active_endpoints=$((active_endpoints + 1))
        fi
    done
    if [[ $active_endpoints -lt $min_endpoints ]]; then
        true
    else
        false
    fi
}

## Check wether the member was succesfully removed from the cluster
should_add_new_member() {
    return_value=0
    if (grep -E "^Member[[:space:]]+[a-z0-9]+\s+removed\s+from\s+cluster\s+[a-z0-9]+$" "$(dirname "$ETCD_DATA_DIR")/member_removal.log") || \
       ! ([[ -d "$ETCD_DATA_DIR/member/snap" ]] && [[ -f "$ETCD_DATA_DIR/member_id" ]]); then
        rm -rf $ETCD_DATA_DIR/* 1>&3 2>&4
    else
        return_value=1
    fi
    rm -f "$(dirname "$ETCD_DATA_DIR")/member_removal.log" 1>&3 2>&4
    return $return_value
}

if [[ ! -d "$ETCD_DATA_DIR" ]]; then
    echo "==> Creating data dir..." 1>&3 2>&4
    echo "==> There is no data at all. Initializing a new member of the cluster..." 1>&3 2>&4
    store_member_id & 1>&3 2>&4
    configure_rbac
else
    echo "==> Detected data from previous deployments..." 1>&3 2>&4
    if [[ $(stat -c "%a" "$ETCD_DATA_DIR") != *700 ]]; then
        echo "==> Setting data directory permissions to 700 in a recursive way (required in etcd >=3.4.10)" 1>&3 2>&4
        chmod -R 700 $ETCD_DATA_DIR
    else
        echo "==> The data directory is already configured with the proper permissions" 1>&3 2>&4
    fi
    if [[ 1 -eq 1 ]]; then
        echo "==> Single node cluster detected!!" 1>&3 2>&4
    elif is_disastrous_failure; then
        echo "==> Cluster not responding!!" 1>&3 2>&4
        echo "==> Disaster recovery is disabled, the cluster will try to recover on it's own..." 1>&3 2>&4
    elif should_add_new_member; then
        echo "==> Adding new member to existing cluster..." 1>&3 2>&4
        etcdctl $AUTH_OPTIONS member add "$HOSTNAME" --peer-urls="http://${HOSTNAME}.etcd-headless.api-gateway.svc.cluster.local:2380" | grep "^ETCD_" > "$ETCD_DATA_DIR/new_member_envs"
        sed -ie "s/^/export /" "$ETCD_DATA_DIR/new_member_envs"
        echo "==> Loading env vars of existing cluster..." 1>&3 2>&4
        source "$ETCD_DATA_DIR/new_member_envs" 1>&3 2>&4
        store_member_id & 1>&3 2>&4
    else
        echo "==> Updating member in existing cluster..." 1>&3 2>&4
        etcdctl $AUTH_OPTIONS member update "$(cat "$ETCD_DATA_DIR/member_id")" --peer-urls="http://${HOSTNAME}.etcd-headless.api-gateway.svc.cluster.local:2380" 1>&3 2>&4
    fi
fi
exec etcd 1>&3 2>&4

Could you login into the etcd container and execute ps aux | grep etcd to see it's options?

1001 1 0.5 0.1 10612200 25476 ? Ssl 03:02 1:46 etcd 1001 6218 0.0 0.0 3088 900 pts/15 S+ 08:47 0:00 grep etcd

tokers commented 3 years ago

That's really strange.

@ziyou434 Could you help to login into the etcd container to APISIX container to capture some packets?

sudo tcpdump -Ans0 'tcp and port 2379' -iany

We need to observe the data on the wire (HTTP request/response).

PS you may need to install tcpdump.

ziyou434 commented 3 years ago

sudo tcpdump -Ans0 'tcp and port 2379' -iany

I find apisix’s etcd key is /apisix/routes，but /apisix/routes/ in etcd ，there is a extra “/”，the key in etcd is different from apisix, maybe this is error.

ziyou434 commented 3 years ago

That's really strange.

@ziyou434 Could you help to login into the etcd container to APISIX container to capture some packets?
sudo tcpdump -Ans0 'tcp and port 2379' -iany
We need to observe the data on the wire (HTTP request/response).

PS you may need to install tcpdump.

There is no yum in the command line of the container，I can not install tcpdump

ziyou434 commented 3 years ago

I could not find the source of the problem, but found other methods, when I changed etcd host “http://my-etcd-headless.{namespace}.svc.cluster.local:2379” to "http://{ip address}:2379", no error was reported. There seems to be a problem with address resolution.

nic-chen commented 3 years ago

I could not find the source of the problem, but found other methods, when I changed etcd host “http://my-etcd-headless.{namespace}.svc.cluster.local:2379” to "http://{ip address}:2379", no error was reported. There seems to be a problem with address resolution.

I think this is the reason.

nic-chen commented 3 years ago

@souzens @yankunsam do you use domain name as etcd host too ?

csh995426531 commented 3 years ago

maybe because of this problem https://stackoverflow.com/questions/54788528/etcd-v3-api-unavailable/56800553#56800553