apache / apisix

The Cloud-Native API Gateway
https://apisix.apache.org/blog/
Apache License 2.0
14.58k stars 2.53k forks source link

help request: Erro 500 after update to version 3.4.2 #9931

Closed zimbres closed 1 year ago

zimbres commented 1 year ago

Description

I'm running apisix in a K3s kubernetes, only changes in default deployments are:

    plugin_attr: 
      redirect:
        https_port: 443
      etcd:
        host:
          - "http://etcd.etcd.svc.cluster.local:2379"
        prefix: "/apisix"
        timeout: 30 

Routes are created via dashboard with HTTPS redirect enabled, also the certificates are loaded via dashboard.

In the version 3.2.2-debian container everything works like a charm. when updated to version 3.4.1-debian, in a regular browser window, only the first request works, the next ones fail with 500 Internal Server Error and the message is logged:

2023/07/30 17:30:21 [error] 50#50: *35172 lua entry thread aborted: runtime error: /usr/local/apisix/apisix/init.lua:332: attempt to index local 'matched_ssl' (a nil value)
stack traceback:
coroutine 0:
    /usr/local/apisix/apisix/init.lua: in function 'verify_https_client'
    /usr/local/apisix/apisix/init.lua:560: in function 'http_access_phase'
    access_by_lua(nginx.conf:336):2: in main chunk, client: 177.81.81.10, server: _, request: "GET / HTTP/2.0", host: "webhookinbox.zimbres.com"
2023/07/30 17:30:21 [error] 50#50: *35172 lua entry thread aborted: runtime error: /usr/local/apisix/apisix/init.lua:332: attempt to index local 'matched_ssl' (a nil value)
stack traceback:
coroutine 0:
    /usr/local/apisix/apisix/init.lua: in function 'verify_https_client'
    /usr/local/apisix/apisix/init.lua:560: in function 'http_access_phase'
    access_by_lua(nginx.conf:336):2: in main chunk, client: 177.81.81.10, server: _, request: "GET /favicon.ico HTTP/2.0", host: "webhookinbox.zimbres.com", referrer: "https://webhookinbox.zimbres.com/"
177.81.81.10 - - [30/Jul/2023:17:30:21 +0000] webhookinbox.zimbres.com "GET / HTTP/2.0" 500 249 0.000 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0" - - - "http://webhookinbox.zimbres.com"
177.81.81.10 - - [30/Jul/2023:17:30:21 +0000] webhookinbox.zimbres.com "GET /favicon.ico HTTP/2.0" 500 249 0.000 "https://webhookinbox.zimbres.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0" - - - "http://webhookinbox.zimbres.com"

In a private windows of browser, works everything, the first and next requests.

By the way, I tried to move to version 3.4.1 to have loki plugin, but I could not find it on dashboard, what I missed?

Environment

nginx version: openresty/1.21.4.1
built by gcc 10.2.1 20210110 (Debian 10.2.1-6)
built with OpenSSL 1.1.1s  1 Nov 2022
TLS SNI support enabled
configure arguments: --prefix=/usr/local/openresty/nginx --with-cc-opt='-O2 -DAPISIX_BASE_VER=1.21.4.1.8 -DNGX_GRPC_CLI_ENGINE_PATH=/usr/local/openresty/libgrpc_engine.so -DNGX_HTTP_GRPC_CLI_ENGINE_PATH=/usr/local/openresty/libgrpc_engine.s
o -DNGX_LUA_ABORT_AT_PANIC -I/usr/local/openresty/zlib/include -I/usr/local/openresty/pcre/include -I/usr/local/openresty/openssl111/include' --add-module=../ngx_devel_kit-0.3.1 --add-module=../echo-nginx-module-0.62 --add-module=../xss-ngi
nx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.33 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.09 --add-module=../srcache-nginx-module-0.32 --add-modul
e=../ngx_lua-0.10.21 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.33 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.19 --add-module=../redis2-nginx-module-0.15 --add-modu
le=../redis-nginx-module-0.3.9 --add-module=../ngx_stream_lua-0.0.11 --with-ld-opt='-Wl,-rpath,/usr/local/openresty/luajit/lib -Wl,-rpath,/usr/local/openresty/wasmtime-c-api/lib -L/usr/local/openresty/zlib/lib -L/usr/local/openresty/pcre/li
b -L/usr/local/openresty/openssl111/lib -Wl,-rpath,/usr/local/openresty/zlib/lib:/usr/local/openresty/pcre/lib:/usr/local/openresty/openssl111/lib' --add-module=/tmp/tmp.aLb1NUnBtM/openresty-1.21.4.1/../mod_dubbo-1.0.2 --add-module=/tmp/tmp
.aLb1NUnBtM/openresty-1.21.4.1/../ngx_multi_upstream_module-1.1.1 --add-module=/tmp/tmp.aLb1NUnBtM/openresty-1.21.4.1/../apisix-nginx-module-1.12.0 --add-module=/tmp/tmp.aLb1NUnBtM/openresty-1.21.4.1/../apisix-nginx-module-1.12.0/src/stream
 --add-module=/tmp/tmp.aLb1NUnBtM/openresty-1.21.4.1/../apisix-nginx-module-1.12.0/src/meta --add-module=/tmp/tmp.aLb1NUnBtM/openresty-1.21.4.1/../wasm-nginx-module-0.6.4 --add-module=/tmp/tmp.aLb1NUnBtM/openresty-1.21.4.1/../lua-var-nginx-
module-v0.5.3 --add-module=/tmp/tmp.aLb1NUnBtM/openresty-1.21.4.1/../grpc-client-nginx-module-v0.4.2 --with-poll_module --with-pcre-jit --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_v2_module --without-
mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --with-http_stub_status_module --with-http_realip_module --with-http_addition_module --with-http_auth_request_module --with-http_secure_link_module --with-http_random_in
dex_module --with-http_gzip_static_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-threads --with-compat --with-stream --with-http_ssl_module
Sn0rt commented 1 year ago
2023/07/30 17:30:21 [error] 50#50: *35172 lua entry thread aborted: runtime error: /usr/local/apisix/apisix/init.lua:332: attempt to index local 'matched_ssl' (a nil value)
stack traceback:
coroutine 0:
    /usr/local/apisix/apisix/init.lua: in function 'verify_https_client'
    /usr/local/apisix/apisix/init.lua:560: in function 'http_access_phase'

This seems to be due to incorrect ssl configuration, not sure if it is caused by etcd. Can you confirm the relevant TLS configuration?

zimbres commented 1 year ago

This is the full config.yml, so, I supose the other parameters are using some default.

apiVersion: v1
kind: ConfigMap
metadata:
  name: apisix
data:
  config.yaml: |-
    apisix:
      node_listen: 9080              # APISIX listening port
      enable_ipv6: false

    deployment:
      admin:
        allow_admin:               # https://nginx.org/en/docs/http/ngx_http_access_module.html#allow
          - 0.0.0.0/0              # We need to restrict ip access rules for security. 0.0.0.0/0 is for test.

        admin_key:
          - name: "admin"
            key: edd1c9f034335f136f87ad84b625c8f1
            role: admin                 # admin: manage all configuration data

      etcd:
        host:                           # it's possible to define multiple etcd hosts addresses of the same etcd cluster.
          - "http://etcd.etcd.svc.cluster.local:2379"          # multiple etcd address
        prefix: "/apisix"               # apisix configurations prefix
        timeout: 30                     # 30 seconds

    plugin_attr: 
      redirect:                         # Plugin: redirect
        https_port: 443                # Set the default port used to redirect HTTP to HTTPS.
Sn0rt commented 1 year ago

This is the full config.yml, so, I supose the other parameters are using some default.

apiVersion: v1
kind: ConfigMap
metadata:
  name: apisix
data:
  config.yaml: |-
    apisix:
      node_listen: 9080              # APISIX listening port
      enable_ipv6: false

    deployment:
      admin:
        allow_admin:               # https://nginx.org/en/docs/http/ngx_http_access_module.html#allow
          - 0.0.0.0/0              # We need to restrict ip access rules for security. 0.0.0.0/0 is for test.

        admin_key:
          - name: "admin"
            key: edd1c9f034335f136f87ad84b625c8f1
            role: admin                 # admin: manage all configuration data

      etcd:
        host:                           # it's possible to define multiple etcd hosts addresses of the same etcd cluster.
          - "http://etcd.etcd.svc.cluster.local:2379"          # multiple etcd address
        prefix: "/apisix"               # apisix configurations prefix
        timeout: 30                     # 30 seconds

    plugin_attr: 
      redirect:                         # Plugin: redirect
        https_port: 443                # Set the default port used to redirect HTTP to HTTPS.

If you use K3s, can you show the relevant SSL-related CRD information? For example, certificate information.

zimbres commented 1 year ago

I loaded the certificate directly on ApiSix dashboard, cert and its key.

image

shreemaan-abhishek commented 1 year ago

@zimbres

  1. From which version did you update to 3.4.2?
  2. Consider using the admin API for loading certs/keys many users have reported compatibility issues with the dashboard and APISIX when managing keys/certs.

By the way, I tried to move to version 3.4.1 to have loki plugin, but I could not find it on dashboard, what I missed?

I don't see the dashboard repository being updated actively maybe that is the reason? 🤔

zimbres commented 1 year ago

Hi, I mistyped, target version is 3.4.1.

The start version was 3.2.2.

After some tests, version 3.3.0 also works perfectly, using certificates from ones loaded from dashboard or ingress.

Failure is on version 3.4.1 only.

zimbres commented 1 year ago

After new tests, even start fresh, need this combination to see this problem:

stonegithup commented 1 year ago

lines 332 after "if " add " matched_ssl and " because matched_ssl perhaps nil. The reason for the higher-level code is this paragraph in the nginx.conf configuration file.

ssl_certificate_by_lua_block { apisix.http_ssl_phase() }

This config does not excute at the h2 protocol, and the browser can reproduce it 2-3 minutes after opening the page.

Hope these leads help image

shreemaan-abhishek commented 1 year ago

@zimbres, please use the admin API for loading certs/keys many users have reported compatibility issues with the dashboard and APISIX when managing keys/certs.

zimbres commented 1 year ago

For the latest test that I performed I loaded certs by referencing then with ApisixTls CR

shreemaan-abhishek commented 1 year ago

@zimbres, please share your route and upstream configurations.

zimbres commented 1 year ago

Hi

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  name: webhookinbox
spec:
  http:
    - name: webhookinbox
      match:
        hosts:
          - webhookinbox.zimbres.com
        paths:
          - "/*"
      backends:
        - serviceName: webhookinbox
          servicePort: 80
      plugins:
        - name: redirect
          enable: true
          config: 
            http_to_https: true
        - name: tcp-logger
          enable: true
          config: 
            batch_max_size: 1
            host: "graylog.graylog.svc.cluster.local"
            name: "tcp logger"
            port: 5555
            tls: false
            include_req_body: true
        - name: response-rewrite
          enable: true
          config:
            headers:
              add: ["Strict-Transport-Security: max-age=31536000; includeSubDomains; preload"]
shreemaan-abhishek commented 1 year ago

are you using the ingress controller and installing it using the helm chart? If yes, please share the helm chart installation command as well. Sorry for the hassle.

zimbres commented 1 year ago

No problem. Its Helm managed by Flux:

apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: apisix
  namespace: apisix
spec:
  interval: 1m0s
  url: https://charts.bitnami.com/bitnami
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: apisix
  namespace: apisix
spec:
  chart:
    spec:
      chart: apisix
      reconcileStrategy: ChartVersion
      sourceRef:
        kind: HelmRepository
        name: apisix
      version: 2.0.3
  interval: 1m0s
  values:
    controlPlane:
      defaultConfig: |
        plugin_attr:
          redirect:
            https_port: 443
        {{- if .Values.controlPlane.metrics.enabled }}
        plugin_attr:
          prometheus:
            export_uri: /apisix/prometheus/metrics
            metric_prefix: apisix_
            enable_export_server: true
            export_addr:
              ip: 0.0.0.0
              port: {{ .Values.controlPlane.containerPorts.metrics }}
        {{- end }}
        nginx_config:
          error_log: /dev/stderr
          stream:
            access_log: /dev/stdout
          http:
            access_log: /dev/stdout
          http_configuration_snippet: |
            proxy_buffering off;
        apisix:
          control:
            ip: 0.0.0.0
            port: {{ .Values.controlPlane.containerPorts.control }}
        deployment:
          role: control_plane
          role_control_plane:
              config_provider: etcd
              conf_server:
                listen: 0.0.0.0:{{ .Values.controlPlane.containerPorts.configServer }}
                cert: /bitnami/certs/{{ .Values.controlPlane.tls.certFilename }}
                cert_key: /bitnami/certs/{{ .Values.controlPlane.tls.certKeyFilename }}
          etcd:
            host:
              {{- if .Values.etcd.enabled  }}
                {{- $replicas := $.Values.etcd.replicaCount | int }}
                {{- range $i, $_e := until $replicas }}
              - {{ printf "%s://%s-%d.%s:%v" (ternary "https" "http" $.Values.etcd.auth.client.secureTransport) (include "apisix.etcd.fullname" $ ) $i (include "apisix.etcd.headlessServiceName" $) ( include "apisix.etcd.port" $ ) }}          {{- end }}
              {{- else }}
              {{- range $node := .Values.externalEtcd.servers }}
              - {{ ternary "https" "http" $.Values.externalEtcd.secureTransport }}://{{ printf "%s:%v" $node (include "apisix.etcd.port" $) }}
              {{- end }}
              {{- end }}
            prefix: /apisix
            timeout: 30
            use_grpc: false
            startup_retry: 60
            {{- if (include "apisix.etcd.authEnabled" .) }}
            user: "{{ print "{{APISIX_ETCD_USER}}" }}"
            password: "{{ print "{{APISIX_ETCD_PASSWORD}}" }}"
            {{- end }}
          {{- if .Values.controlPlane.tls.enabled }}
          certs:
            {{- if .Values.controlPlane.tls.enabled }}
            cert: /bitnami/certs/{{ .Values.controlPlane.tls.certFilename }}
            cert_key: /bitnami/certs/{{ .Values.controlPlane.tls.certKeyFilename }}
            {{- if .Values.controlPlane.tls.certCAFilename }}
            client_ca_cert: /bitnami/certs/{{ .Values.controlPlane.tls.certCAFilename }}
            {{- end }}
            {{- end }}
          {{- end }}
          admin:
            {{- if .Values.controlPlane.tls.enabled }}
            https_admin: true
            admin_api_mtls:
              admin_ssl_cert: /bitnami/certs/{{ .Values.controlPlane.tls.certFilename }}
              admin_ssl_cert_key: /bitnami/certs/{{ .Values.controlPlane.tls.certKeyFilename }}
            {{- end }}

            allow_admin:
              - 0.0.0.0/0

            admin_key:
              - name: admin
                key: "{{ print "{{APISIX_ADMIN_API_TOKEN}}" }}"
                role: admin
              - name: viewer
                key: "{{ print "{{APISIX_VIEWER_API_TOKEN}}" }}"
                role: viewer
            admin_listen:
                port: {{ .Values.controlPlane.containerPorts.adminAPI }}
            enable_admin_cors: true         # Admin API support CORS response headers.
        discovery:
          kubernetes:
            service:
              schema: https #default https

              # apiserver host, options [ipv4, ipv6, domain, environment variable]
              host: ${KUBERNETES_SERVICE_HOST}

              # apiserver port, options [port number, environment variable]
              port: ${KUBERNETES_SERVICE_PORT}

            client:
              # serviceaccount token or token_file
              token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

            default_weight: 50 # weight assigned to each discovered endpoint. default 50, minimum 0
    dataPlane:
      metrics:
        enabled: true
      defaultConfig: |
        {{- if .Values.dataPlane.metrics.enabled }}
        plugin_attr:
          redirect:
            https_port: 443
          prometheus:
            export_uri: /apisix/prometheus/metrics
            metric_prefix: apisix_
            enable_export_server: true
            export_addr:
              ip: 0.0.0.0
              port: {{ .Values.dataPlane.containerPorts.metrics }}
        {{- else }}
        plugin_attr:
          redirect:
            https_port: 443
        {{- end }}
        apisix:
          node_listen: {{ .Values.dataPlane.containerPorts.http }}
          enable_admin: false
          {{- if .Values.dataPlane.tls.enabled }}
          ssl:
            enable: true
            listen:
              - port: {{ .Values.dataPlane.containerPorts.https }}
                enable_http2: true
            ssl_trusted_certificate: /bitnami/certs/{{ .Values.dataPlane.tls.certCAFilename }}
          {{- end }}
          control:
            ip: 0.0.0.0
            port: {{ .Values.dataPlane.containerPorts.control }}
        nginx_config:
          error_log: /dev/stderr
          stream:
            access_log: /dev/stdout
          http:
            access_log: /dev/stdout
          http_configuration_snippet: |
            proxy_buffering off;
        deployment:
          role: data_plane
          role_data_plane:
            config_provider: control_plane
            {{- if .Values.controlPlane.enabled }}
            control_plane:
              host:
                - {{ ternary "https" "http" .Values.controlPlane.tls.enabled }}://{{ include "apisix.control-plane.fullname" . }}:{{ .Values.controlPlane.service.ports.configServer }}
              prefix: /apisix
              timeout: 30
            {{- end }}
          {{- if .Values.dataPlane.tls.enabled }}
          certs:
            {{- if .Values.dataPlane.tls.enabled }}
            cert: /bitnami/certs/{{ .Values.dataPlane.tls.certFilename }}
            cert_key: /bitnami/certs/{{ .Values.dataPlane.tls.certKeyFilename }}
            {{- if .Values.dataPlane.tls.certCAFilename }}
            client_ca_cert: /bitnami/certs/{{ .Values.dataPlane.tls.certCAFilename }}
            {{- end }}
            {{- end }}
          {{- end }}
        discovery:
          kubernetes:
            service:
              # apiserver schema, options [http, https]
              schema: https #default https

              # apiserver host, options [ipv4, ipv6, domain, environment variable]
              host: ${KUBERNETES_SERVICE_HOST} #default ${KUBERNETES_SERVICE_HOST}

              # apiserver port, options [port number, environment variable]
              port: ${KUBERNETES_SERVICE_PORT}  #default ${KUBERNETES_SERVICE_PORT}

            client:
              # serviceaccount token or token_file
              token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

            default_weight: 50 # weight assigned to each discovered endpoint. default 50, minimum 0
      service:
        externalTrafficPolicy: Local
    dashboard:
      service:
        type: ClusterIP
        ports:
          http: 80
      enable: true
      username: admin
      password: "Admin123"
      defaultConfig: |
        conf:
          listen:
            host: 0.0.0.0
            port: {{ .Values.dashboard.containerPorts.http }}
          {{- if .Values.dashboard.tls.enabled }}
          ssl:
            host: 0.0.0.0
            port: {{ .Values.dashboard.containerPorts.https }}
            cert: /bitnami/certs/{{ .Values.dashboard.tls.certFilename }}
            key: /bitnami/certs/{{ .Values.dashboard.tls.certKeyFilename }}
          {{- end }}
          etcd:
            prefix: "/apisix"
            endpoints:
              {{- if .Values.etcd.enabled  }}
                {{- $replicas := $.Values.etcd.replicaCount | int }}
                {{- range $i, $_e := until $replicas }}
              - {{ printf "%s://%s-%d.%s:%v" (ternary "https" "http" $.Values.etcd.auth.client.secureTransport) (include "apisix.etcd.fullname" $ ) $i (include "apisix.etcd.headlessServiceName" $) ( include "apisix.etcd.port" $ ) }}          {{- end }}
              {{- else }}
              {{- range $node :=.Values.externalEtcd.servers }}
              - {{ printf "%s:%v" $node (include "apisix.etcd.port" $) }}
              {{- end }}
              {{- end }}
            {{- if (include "apisix.etcd.authEnabled" .) }}
            username: "{{ print "{{ APISIX_ETCD_USER }}" }}"
            password: "{{ print "{{ APISIX_ETCD_PASSWORD }}" }}"
            {{- end }}
          log:
            error_log:
              level: warn
              file_path: /dev/stderr
            access_log:
              file_path: /dev/stdout
        authentication:
          secret: secret
          expire_time: 3600
          users:
            - username: "{{ print "{{ APISIX_DASHBOARD_USER }}" }}"
              password: "{{ print "{{ APISIX_DASHBOARD_PASSWORD }}" }}"
        plugins:
          - api-breaker
          - authz-casbin
          - authz-casdoor
          - authz-keycloak
          - aws-lambda
          - azure-functions
          - basic-auth
          - batch-requests
          - clickhouse-logger
          - client-control
          - consumer-restriction
          - cors
          - csrf
          - datadog
          # - dubbo-proxy
          - echo
          - error-log-logger
          - ext-plugin-post-req
          - ext-plugin-post-resp
          - ext-plugin-pre-req
          - fault-injection
          - file-logger
          - forward-auth
          - google-cloud-logging
          - grpc-transcode
          - grpc-web
          - gzip
          - hmac-auth
          - http-logger
          - ip-restriction
          - jwt-auth
          - kafka-logger
          - kafka-proxy
          - key-auth
          - ldap-auth
          - limit-conn
          - limit-count
          - limit-req
          - loggly
          - log-rotate
          - mocking
          - node-status
          - opa
          - openid-connect
          - opentelemetry
          - openwhisk
          - prometheus
          - proxy-cache
          - proxy-control
          - proxy-mirror
          - proxy-rewrite
          - public-api
          - real-ip
          - redirect
          - referer-restriction
          - request-id
          - request-validation
          - response-rewrite
          - rocketmq-logger
          - server-info
          - serverless-post-function
          - serverless-pre-function
          - skywalking
          - skywalking-logger
          - sls-logger
          - splunk-hec-logging
          - syslog
          - tcp-logger
          - traffic-split
          - ua-restriction
          - udp-logger
          - uri-blocker
          - wolf-rbac
          - zipkin
          - elasticsearch-logge
          - openfunction
          - tencent-cloud-cls
          - ai
          - cas-auth
    ingressController:
      enabled: true
    externalEtcd:
      servers:
      - etcd.etcd.svc.cluster.local
      port: 2379
    etcd:
      enabled: false
alucryd commented 1 year ago

I'm in the same situation, installed on GKE using the helm chart (v1.5.1), I'm using the admin api directly for almost everything except the certificates that are added via ApisixTls resources and the ingress controller (which I understand also calls the admin api anyway).

Initial requests appear to work fine, but after a while I get the same lua errors and apisix becomes unusable until I restart the pod.

I have issues when browsing the dashboard via the gateway (using Brave), but also with a few express APIs, calling them via Thunder Client (in VSCode) works exactly once, then the gateway returns 500. Interestingly, calling the APIs via Insomnia works just fine, so some clients fare better than others.

Revolyssup commented 1 year ago

This is an intermitten bug seen during some load tests and on some particular browsers by other users. This is also being tracked here - https://github.com/apache/apisix/issues/9610

Revolyssup commented 1 year ago

Fix created for this and reasons explained here - https://github.com/apache/apisix/pull/10066

zimbres commented 1 year ago

3.5 fixes it.

Thanks

kingluo commented 1 year ago

We can not reproduce the issue on the master branch anymore, because the commit https://github.com/apache/apisix/pull/9903 after 3.4.1: adds ssl_client_hello_by_lua_block. This phase used by apisix always constructs ngx.ctx.matched_ssl:

https://github.com/apache/apisix/blob/f47c2d79f7cf2d8618189924fc12a226c8420564/apisix/init.lua#L205-L207