apache / apisix

The Cloud-Native API Gateway
https://apisix.apache.org/blog/
Apache License 2.0
14.52k stars 2.52k forks source link

bug: requests with Istio mTLS enabled fail with connection termination #7377

Closed svilenvul closed 9 months ago

svilenvul commented 2 years ago

Current Behavior

We are now using APISIX in a Kubernetes setup with Helm (https://github.com/apache/apisix-helm-chart). APISIX is running as a service in the Istio Service Mesh with Envoy sidecar applied on it.

We faced an issue where after we enabled mTLS with Istio, requests targeted to APISIX failed. During debugging we saw that the authority header for the outgoing requests from the APISIX was always set to apisix_backend. We think that his is confusing Istio during the mTLS and results in the request failure.

Expected Behavior

Requests should be successful both with Istio mTLS enabled and disabled.

Error Logs

Request Headers Info (from client grpcurl)

authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.eyJleHAiOjE2NTY2ODg4NzUsImlhdCI6MTY1NjYwMjQ3NSwiYXV0aF90aW1lIjoxNjU2NDIwNzAzLCJqdGkiOiI5ZTk1ZDdhZS0yNzlmLTRlNTktODE0Yi1mYzNkMzNmYmM4MDEiLCJpc3MiOiJodHRwczovL3RvZ2dpZC50b2dnLmNsb3VkL2F1dGgvcmVhbG1zL3RvZ2dpZCIsInN1YiI6IjgwZTkxMzYwLWVmOTAtNDI5OC05ODRkLTcxNDBiNDY5NTFlMyIsInR5cCI6IkJlYXJlciIsImF6cCI6InN1cGVyLWFwcCIsInNlc3Npb25fc3RhdGUiOiI5ZDUwNmFjMS1mNGZlLTRiOTUtYjJkNy1iYTdjOTg0ODc1MjUiLCJhY3IiOiIwIiwic2NvcGUiOiJvcGVuaWQgZW1haWwgcHJvZmlsZSIsInNpZCI6IjlkNTA2YWMxLWY0ZmUtNGI5NS1iMmQ3LWJhN2M5ODQ4NzUyNSIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJuYW1lIjoiVGVzdCBUZXN0IiwicHJlZmVycmVkX3VzZXJuYW1lIjoidGVzdEB0b2dnLmNsb3VkIiwiZ2l2ZW5fbmFtZSI6IlRlc3QiLCJmYW1pbHlfbmFtZSI6IlRlc3QiLCJlbWFpbCI6InRlc3RAdG9nZy5jbG91ZCJ9.BK-EGfdyi7DoIQTRYxUFBK54f4g2IyAK6DlQDinDldjf2OXFRyWIK9OwN7Q5-hW5BO0hn0huJ4aQZ59WGUdZ5RjZqVCV3-w2ybr7BXHkwKJYnjrB0lcFy4in1WB_eiD4TMBdqb7vG6dxC8bGdm8YmBfFvJ7Ufghle33pjj67k8SJj3zUFRBK-f4umKesakfTlhlMMdALbCTxV9jIoXPtDpvDEF6V89N7LKnnoV8Q3lPBF56PGeBokdqEJLfsb5ZQcaMeW8Fi38adqZTa8A4WefoRRsOrgEhXMYoU8DrY1EWvatgms4vJKag6bygkp_2nsNKT__hoYIDBvvJMke60VQ

Response headers (from client grpcurl)

content-length: 0
content-type: application/grpc
date: Thu, 30 Jun 2022 15:39:53 GMT
server: istio-envoy
x-envoy-upstream-service-time: 84

Logs from Envoy Proxy sidecar container for APISIX

{
    "duration"1,
    "downstream_remote_address":"172.20.50.105:0",
    "upstream_service_time":null,
    "upstream_local_address":"10.234.106.162:50972",
    "response_code_details":"upstream_reset_before_response_started{connection_termination}",
    "upstream_transport_failure_reason":null,
    "route_name":"allow_any",
    "response_code":200,
    "upstream_host":"10.234.29.1:80","user_agent":"grpcurl/v1.8.1 grpc-go/1.37.0",
    "downstream_local_address":"10.234.29.1:80",
    "x_forwarded_for":"172.20.50.105",
    "connection_termination_details":null,
    "protocol":"HTTP/2",
    "upstream_cluster":"PassthroughCluster",
    "authority":"apisix_backend",
    "method":"POST",
    "start_time":"2022-06-30T15:39:53.730Z",
    "path":"/xxxx.yyyy.ms.profile.ProfileService/GetUserProfile",
    "bytes_received":51,
    "response_flags":"UC",
    "request_id":"7e4abc76-27f4-4f40-b663-056c648608b7",
    "bytes_sent":0,
    "requested_server_name":null
}

Logs from APISIX container

127.0.0.6 - - [30/Jun/2022:15:39:53 +0000] xxxxx-api-gateway.xxxx.cloud:9443 "POST /xxxx.yyyy.ms.profile.ProfileService/GetUserProfile HTTP/2.0" 200 0 0.005 "-" "grpcurl/v1.8.1 grpc-go/1.37.0" 10.234.29.1:80 200 0.004 "grpc://xxxxxx-api-gateway.xxxx.cloud:9443"

Steps to Reproduce

  1. Install Istio in k8s cluster
  2. Enable Istio Strict mTLS
    $ kubectl apply -n istio-system -f - <<EOF
    apiVersion: security.istio.io/v1beta1
    kind: PeerAuthentication
    metadata:
    name: "default"
    spec:
    mtls:
    mode: STRICT
    EOF
  3. Create namespace for APISIX and enable auto injection for Istio
  4. Install APISIX with a Helm chart
  5. Install a gRPC Service in the same namespace
  6. Create Route and Upstream in APISIX


### Environment

- APISIX version: 2.12.1 
- k8s version: 1.20.7
- Istio version: 1.10.3
svilenvul commented 2 years ago

@tao12345666333 this issue is a continuation of https://the-asf.slack.com/archives/CUC5MN17A/p1656432528471899

tokers commented 2 years ago

Strange, as per the upstream configuration you given, APISIX will use the grpc-service.namespace.svc.cluster.local as the host (authority) header.

marziman commented 2 years ago

@tao12345666333 do you have an idea why this is happening? We expected that the authority will be not changed.

tao12345666333 commented 2 years ago

There is a description here.

https://github.com/apache/apisix/blob/master/docs/en/latest/stream-proxy.md

But in order to solve the actual problem here, I need to know your full request chain.

Below is my understanding, please correct me if I understand wrong

Client -------> Istio IngressGateway ---> Envoy  --->   Envoy 
         TLS.                               |             |
                                            V             V
                                          APISIX        Backend
svilenvul commented 2 years ago

@tao12345666333, yes the request chain is correct.

tao12345666333 commented 2 years ago

I'm working on a scenario where this behavior can be bypassed. I'll update once I have results

svilenvul commented 2 years ago

Great. If I understand, you were able to reproduce it, right?

tao12345666333 commented 2 years ago

yes! I'm trying how to get around this behavior

tao12345666333 commented 2 years ago

Also, I'm trying to understand your current deployment architecture.

tao12345666333 commented 2 years ago

Hi, could you please provide me with the following information:

thanks!

svilenvul commented 2 years ago

Hi, @tao12345666333, I will be glad to help you. Can you tell me how can I get this information for you?

tao12345666333 commented 2 years ago

@svilenvul thanks!

Request routing directly in APISIX container, check its request and response headers

you can just run kubectl exec -n <YOUR NAMESPACE> deploy <APISIX's deployment> -- curl -vv <APISIX listen port>/<ROUTE path> -H "HOST: <your route host>" <and something else>

Request routing from another container, check its request and response headers

kubectl exec -n <YOUR NAMESPACE> deploy < another deployment> -- curl -vv <APISIX-gateway service name><APISIX-gateway service listen port>/<ROUTE path> -H "HOST: <your route host>" <and something else>

svilenvul commented 2 years ago

FYI, currently istio mTLS is in permissive mode (the issue occurs only in strict mode)

Request routing directly in APISIX container, check its request and response headers

kubectl exec -n xxx-id-system deploy/tiam-ms-apigateway-apisix -- curl -vv http://localhost:9080/xxx.tdp.dp.ms.legalagreements.v2.ClientService/ListLegalDocuments -H "HOST: xxx-api-gateway.xxx.cloud" --http2

> GET /xxxx.tdp.dp.ms.legalagreements.v2.ClientService/ListLegalDocuments HTTP/1.1
> Host: xxx-api-gateway.xxx.cloud
> User-Agent: curl/7.79.1
> Accept: */*
> Connection: Upgrade, HTTP2-Settings
> Upgrade: h2c
> HTTP2-Settings: AAMAAABkAAQCAAAAAAIAAAAA
> 
* Received HTTP/0.9 when not allowed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
* Closing connection 0
curl: (1) Received HTTP/0.9 when not allowed
command terminated with exit code 1

Request routing from another container, check its request and response headers

kubectl exec -n xxx-id-system deploy/tiam-core-authn -- curl -vv http://tiam-ms-apigateway-apisix-gateway:80/xxx.tdp.dp.ms.legalagreements.v2.ClientService/ListLegalDocuments -H "HOST: xxx-api-gateway.xxx.cloud" --http2

> GET /xxx.tdp.dp.ms.legalagreements.v2.ClientService/ListLegalDocuments HTTP/1.1
> Host: xxx-api-gateway.xxx.cloud
> User-Agent: curl/7.61.1
> Accept: */*
> Connection: Upgrade, HTTP2-Settings
> Upgrade: h2c
> HTTP2-Settings: AAMAAABkAARAAAAAAAIAAAAA
> 
< HTTP/1.1 200 OK
< date: Tue, 26 Jul 2022 07:35:52 GMT
< content-type: application/grpc
< content-length: 0
< grpc-status: 7
< grpc-message: RBAC: access denied
< x-envoy-upstream-service-time: 62
< server: envoy
tao12345666333 commented 2 years ago

Thanks

svilenvul commented 2 years ago

@tao12345666333 do you have any updates on this issue?

svilenvul commented 2 years ago

Our issue might be related with https://github.com/apache/apisix/issues/7573. I think if we have control on this header, it solve our issue with the mTLS as well.

tao12345666333 commented 2 years ago

@svilenvul hi, I sent you an email yesterday.

I haven't tried proxy-rewrite, so I don't know if it can be a solution. Can you try it directly in your environment?

svilenvul commented 2 years ago

I tried to edit the route and add the plugins section with the proxy-rewrite plugin, but I could manage to configure this plugin by adding:

"plugins": {
        "proxy-rewrite": {
            "host": "...."
        }
    },

When I save the route setting, the change is not applied. I tried to apply from the dashboard as well, but I don't see the proxy-rewrite plugin in the UI when I select the route, click edit and navigate to the plugins section. I was not able to verify if this plugin will solve the issue for us.

tao12345666333 commented 2 years ago

I tried to apply from the dashboard as well, but I don't see the proxy-rewrite plugin in the UI when I select the route, click edit and navigate to the plugins section.

@juzhiyuan @bzp2010 Do you know what restrictions are on the dashboard?

juzhiyuan commented 2 years ago

In the Route -> Create page, the plugin has been implemented as UI, see:

image

juzhiyuan commented 2 years ago

I tried to apply from the dashboard as well, but I don't see the proxy-rewrite plugin in the UI when I select the route, click edit and navigate to the plugins section.

@juzhiyuan @bzp2010 Do you know what restrictions are on the dashboard?

APISIX Dashboard only works well with apache/apisix for now, AFAIK. Does this mean it's recommended to use Dashboard to control Ingress?

tao12345666333 commented 2 years ago

I don't think @svilenvul is using APISIX Ingress controller.

marziman commented 2 years ago

@tao12345666333 we are not using APISIX ingress controller. We were not able to use and see the proxy-rewrite plugin, to solve this.

Can you please help @tao12345666333 cause it is blocking us for extremly long time. And we actually would be fine if we could control the header as described at https://github.com/apache/apisix/issues/7377#issuecomment-1205451777

Please guide us what to do. Thanks

tao12345666333 commented 2 years ago

@tzssangglass @tokers @spacewander Can someone please pick it up? I guess this requires some APISIX related modifications.

Or maybe there is something I didn't notice

marziman commented 2 years ago

We would be really thankful! Everyone using APISIX in k8s with Istio will face this issue, and I (and you alll) hope many people will do so in a ServiceMesh constellation. The issue is exactly similiar to this one https://github.com/apache/apisix/issues/7573. We can not modify the authority and this causes Istio to completely reject.

Many thanks for all your hard work and help. BR Mehmet

juzhiyuan commented 2 years ago

Hi @marziman, I have emailed you and Mattiullah but without a reply.

This issue really takes a long time, and to better help your business resolve those issues, could you please pick one slot at your convenience from https://meetings.hubspot.com/zhiyuan? I will invite apisix's maintainers to attend. 😉

marziman commented 2 years ago

Hello @juzhiyuan

We gave all the inputs to test this. There is one issue https://github.com/apache/apisix/issues/7573 which is what we are facing.

Can someone of the core maintainers say something, as this is authority header is breaking things in a scenario of Istio & Apisix.

All is described in the issue, a meeting would not bring anything else to the table.

@tao12345666333 @tokers @spacewander I think you forgot about this topic. Could you pleas engage 🙏🏻

BR Mehmet

juzhiyuan commented 2 years ago

@marziman Hi, sure, let me check with teammates.

spacewander commented 2 years ago

@marziman I just submitted https://github.com/apache/apisix/pull/7939/files for it. Does adding grpc_set_header "Host" $upstream_host; ahead of grpc_set_header Content-Type application/grpc; in apisix/cli/ngx_tpl.lua solve your problem?

marziman commented 2 years ago

@spacewander many thanks! Is there a way that we can get an unofficial APISIX docker image version with this changes applied, so we can deploy that with our APISIX Helm charts (we are using your official Helm charts). So we can fastly test this and report back to you?

bzp2010 commented 2 years ago

Hi, @marziman.

I help rebuild the image so you can give it a quick try. Image tag bzp2010/test:apisix-2.15.0-grpc-authority-patch. Note that do not use for production (use my binary directly), at that time you will need to build it yourself to ensure security.

I use this patch:

From 23d5828410cc0ae93758b5282678669f0a2e8d60 Mon Sep 17 00:00:00 2001
From: Zeping Bai <bzp2010@apache.org>
Date: Tue, 20 Sep 2022 10:23:27 +0800
Subject: [PATCH] fix: grpc authority header

---
 apisix/cli/ngx_tpl.lua | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/apisix/cli/ngx_tpl.lua b/apisix/cli/ngx_tpl.lua
index 130033be..5ba058a4 100644
--- a/apisix/cli/ngx_tpl.lua
+++ b/apisix/cli/ngx_tpl.lua
@@ -775,6 +775,14 @@ http {
                 apisix.grpc_access_phase()
             }

+            {% if use_apisix_openresty then %}
+            # For servers which obey the standard, when `:authority` is missing,
+            # `host` will be used instead. When used with apisix-base, we can do
+            # better by setting `:authority` directly
+            grpc_set_header   ":authority" $upstream_host;
+            {% else %}
+            grpc_set_header   "Host" $upstream_host;
+            {% end %}
             grpc_set_header   Content-Type application/grpc;
             grpc_socket_keepalive on;
             grpc_pass         $upstream_scheme://apisix_backend;
-- 
2.34.1
juzhiyuan commented 2 years ago

Note that do not use for production

Hi @bzp2010, could you please explain this in more detail? e.g., which points this image will take? Thanks!

bzp2010 commented 2 years ago

OK, as a supplement, I provide my build script, which only adds the patching operation to the official script.

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

FROM debian:bullseye-slim

ARG APISIX_VERSION=2.15.0

COPY ./0001-fix-grpc-authority-header.patch /0001-fix-grpc-authority-header.patch

RUN set -ex; \
    arch=$(dpkg --print-architecture); \
    apt update; \
    apt-get -y install --no-install-recommends wget gnupg ca-certificates git; \
    codename=`grep -Po 'VERSION="[0-9]+ \(\K[^)]+' /etc/os-release`; \
    wget -O - https://openresty.org/package/pubkey.gpg | apt-key add -; \
    case "${arch}" in \
      amd64) \
        echo "deb http://openresty.org/package/debian $codename openresty" | tee /etc/apt/sources.list.d/openresty.list \
        && wget -O - http://repos.apiseven.com/pubkey.gpg | apt-key add - \
        && echo "deb http://repos.apiseven.com/packages/debian $codename main" | tee /etc/apt/sources.list.d/apisix.list \
        ;; \
      arm64) \
        echo "deb http://openresty.org/package/arm64/debian $codename openresty" | tee /etc/apt/sources.list.d/openresty.list \
        && wget -O - http://repos.apiseven.com/pubkey.gpg | apt-key add - \
        && echo "deb http://repos.apiseven.com/packages/arm64/debian $codename main" | tee /etc/apt/sources.list.d/apisix.list \
        ;; \
    esac; \
    apt update \
    && apt install -y apisix=${APISIX_VERSION}-0 \
    && rm -f /etc/apt/sources.list.d/openresty.list /etc/apt/sources.list.d/apisix.list \
    && openresty -V \
    && apisix version \
    && cd /usr/local/apisix && git apply /0001-fix-grpc-authority-header.patch && cat apisix/cli/ngx_tpl.lua \
    && apt-get purge -y --auto-remove

WORKDIR /usr/local/apisix

# forward request and error logs to docker log collector
RUN ln -sf /dev/stdout /usr/local/apisix/logs/access.log \
    && ln -sf /dev/stderr /usr/local/apisix/logs/error.log

EXPOSE 9080 9443

COPY ./docker-entrypoint.sh /docker-entrypoint.sh

ENTRYPOINT ["/docker-entrypoint.sh"]

CMD ["docker-start"]

STOPSIGNAL SIGQUIT

Here's what I added, in addition to that I installed git, which helped install the patch.

cd /usr/local/apisix && git apply /0001-fix-grpc-authority-header.patch && cat apisix/cli/ngx_tpl.lua
marziman commented 2 years ago

@bzp2010 I did not fully understand. Can we use your image tag for testing id this will solve our authority_header issue? Can we just adjust APISIX helm chart and put this image or do we need to provide any further configuration?

I am thinking to add your image tag in the helm chart at: https://github.com/apache/apisix-helm-chart/blob/78cb59f1fe6c8ce16681e593faefc80aad1d1879/charts/apisix/values.yaml#L51

With these values:

image: repository: apache/apisix bzp2010/test pullPolicy: IfNotPresent

Overrides the image tag whose default is the chart appVersion.

tag: ~~2.15.0-alpine~~ apisix-2.15.0-grpc-authority-patch

Please guide us, so we can help to test this valuable work of you. Many thanks!

bzp2010 commented 2 years ago

@marziman

Yes, you can do this, but only for testing purposes. If it works, I recommend that you follow my steps to build the APISIX image yourself to ensure that the build product is safe and trustworthy. If you run into a problem, I will help you fix it.

marziman commented 2 years ago

@bzp2010 Why not merging to upstream if our test is successful? Cant we bring this fully to APISIX over official image than?

bzp2010 commented 2 years ago

@marziman

Of course, a fix has been merged into the main development branch and it is now up to you to determine if that mitigation is working. If it doesn't work, we will continue to improve it, otherwise it will be included in the latest release. If it does not cause a break change to an older LTS release, it will be backported to the LTS branch and included in the next LTS release.

Note that APISIX does not immediately release a version for each bugfix (unless it causes a bug that seriously affects operation); they will be released as an accumulation, in the next release cycle.

BTW, the expected next release in the development branch is a preview version of v3. The current LTS version is 2.15.

bzp2010 commented 2 years ago

Hi, @spacewander.

Will this fix #7939 be backported back to version 2.15 LTS? As far as I know, this is not a breaking update. If it can, we better add the need backport tag to it.

svilenvul commented 2 years ago

Thank you for the applied changes. We have used the bzp2010/test:apisix-2.15.0-grpc-authority-patch image in our development cluster and tried to reach again our microservice with Istio mTLS enabled.

The result is that we are getting the following error on the client side:

Response headers received:
(empty)

Response trailers received:
(empty)
Sent 1 request and received 0 responses
ERROR:
  Code: Unavailable
  Message: Bad Gateway: HTTP status code 502; transport: received the unexpected content-type "text/html; charset=utf-8"
bzp2010 commented 2 years ago

@svilenvul If I understand correctly, this means that APISIX is returning this HTTP 502 error?

If it does, can you check the APISIX log, which should contain the APISIX upstream error log, which for K8s environments is usually in the pod log.

juzhiyuan commented 2 years ago

Hi Team, @svilenvul @marziman To better resolve this issue, may I know if there have any updates on your side?

svilenvul commented 2 years ago

Hi @bzp2010 the logs of APISIX show the following lines


apisix 2022/10/25 07:10:36 [error[] 49#49: *5385 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.6, server: _, request: "POST /xx.yyyyy.integrations.v2.IntegrationsClientService/ListXxxxModels HTTP/2.0", upstream: "grpc://10.3.99.246:8081", host: "apigw.dev.xxxx-yx.tt:9080"
apisix 127.0.0.6 - - [25/Oct/2022:07:10:36 +0000] apigw.dev.xxxx-yx.tt:9080 "POST /xx.yyyyy.integrations.v2.IntegrationsClientService/ListXxxxModels HTTP/2.0" 502 154 1.006 "-" "grpc-go/1.37.0" 10.3.99.246:8081 502 0.999 "grpc://apigw.dev.xxxx-yx.tt:9080"

I hope that this will help you. We used the test docker image that you provided above.

bzp2010 commented 2 years ago

@svilenvul

I notice that you are trying to proxy a gRPC service, that means the request has to be HTTP/2 with TLS, anyway the mTLS provided by Sidecar is no substitute for this TLS, because it is transparent to both C/S sides. For the server, it provides an H2C service, which is not the same as HTTP/2.

For APISIX, if you want to use plaintext HTTP-based HTTP2, i.e. H2C, you have to manually turn on that switch, and note that it cannot coexist with regular HTTP.

My suggestion for now is to try proxying a normal HTTP service first, and if it does work, try the following scenario, turn on H2C support for APISIX, and expose it in the kubernetes service.

bzp2010 commented 2 years ago

How to turn on H2C

  1. Modify the ConfigMap resource of APISIX, and add a new port to listen to. https://github.com/apache/apisix/blob/master/conf/config-default.yaml#L24-L27
  2. Modify the Deployment resource of APISIX to expose that new port with HTTP2 enabled.
  3. Create a new Kubernetes Service or modify the previous one to expose the port via service. Note that you must set the appProtocol in the service port to HTTP2 or gRPC to meet Istio's requirements for application protocol identification.
  4. Use that new port access in the service.

Other

An alternative option, configuring TLS for APISIX, is TLS over Istio mTLS. Clients use TLS to communicate with the server, even if they have transparent mTLS encryption.

juzhiyuan commented 1 year ago

Hey @svilenvul @marziman, after trying https://github.com/apache/apisix/issues/7377#issuecomment-1303022226, does your team have any feedback on this? :)

github-actions[bot] commented 9 months ago

This issue has been marked as stale due to 350 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@apisix.apache.org list. Thank you for your contributions.

github-actions[bot] commented 9 months ago

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.