Open ray1888 opened 10 months ago
Could you provide corresponding route configurations? Do you have any WASM plugin eanbled for this route?
throught Higress CURL Request: curl 'http://log.gitee.work/assets/login/js/chunk-vendors.705a060b.js' \ -H 'Accept: /' \ -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8' \ -H 'Cache-Control: no-cache' \ -H 'Connection: keep-alive' \ -H 'Pragma: no-cache' \ -H 'Referer: http://log.gitee.work/login' \ -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' \ --compressed \ --insecure
throught nodeport CURL request: curl 'http://log.gitee.work:20571/assets/login/js/chunk-vendors.705a060b.js' \ -H 'Accept: /' \ -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8' \ -H 'Cache-Control: no-cache' \ -H 'Connection: keep-alive' \ -H 'Pragma: no-cache' \ -H 'Referer: http://log.gitee.work:20571/login' \ -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' \ --compressed \ --insecure
Could you provide corresponding route configurations? Do you have any WASM plugin eanbled for this route?
Sure,
svc for nodePort
@ray1888 The Higress log field response_code is 200, indicating that the response is returned by the upstream service. The error is net::ERR_CONTENT_LENGTH_MISMATCH. It may be that your upstream service did not return a complete response.
Did you use nginx as the upstream ?
cc https://github.com/xhlwill/blog/issues/17#issuecomment-848631589
Did you use nginx as the upstream ?
yes, the upstream svc is a nginx
Did you use nginx as the upstream ?
cc xhlwill/blog#17 (comment) i have check the nginx log, but it seem to be normal in nginx part
throught Higress CURL Request: curl 'http://log.gitee.work/assets/login/js/chunk-vendors.705a060b.js' -H 'Accept: /' -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8' -H 'Cache-Control: no-cache' -H 'Connection: keep-alive' -H 'Pragma: no-cache' -H 'Referer: http://log.gitee.work/login' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' --compressed --insecure
throught nodeport CURL request: curl 'http://log.gitee.work:20571/assets/login/js/chunk-vendors.705a060b.js' -H 'Accept: /' -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8' -H 'Cache-Control: no-cache' -H 'Connection: keep-alive' -H 'Pragma: no-cache' -H 'Referer: http://log.gitee.work:20571/login' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' --compressed --insecure
I found the upstream host is 10.244.18.92:8080
in your log. Try this one:
curl 'http://10.244.18.92:8080/assets/login/js/chunk-vendors.705a060b.js' -H 'Host: log.gitee.work' -H 'Accept: /' -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8' -H 'Cache-Control: no-cache' -H 'Connection: keep-alive' -H 'Pragma: no-cache' -H 'Referer: http://log.gitee.work:20571/login' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' --compressed --insecure
throught Higress CURL Request: curl 'http://log.gitee.work/assets/login/js/chunk-vendors.705a060b.js' -H 'Accept: /' -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8' -H 'Cache-Control: no-cache' -H 'Connection: keep-alive' -H 'Pragma: no-cache' -H 'Referer: http://log.gitee.work/login' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' --compressed --insecure throught nodeport CURL request: curl 'http://log.gitee.work:20571/assets/login/js/chunk-vendors.705a060b.js' -H 'Accept: /' -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8' -H 'Cache-Control: no-cache' -H 'Connection: keep-alive' -H 'Pragma: no-cache' -H 'Referer: http://log.gitee.work:20571/login' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' --compressed --insecure
I found the upstream host is
10.244.18.92:8080
in your log. Try this one:curl 'http://10.244.18.92:8080/assets/login/js/chunk-vendors.705a060b.js' -H 'Host: log.gitee.work' -H 'Accept: /' -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8' -H 'Cache-Control: no-cache' -H 'Connection: keep-alive' -H 'Pragma: no-cache' -H 'Referer: http://log.gitee.work:20571/login' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' --compressed --insecure
in kubernete cluster ,curl response is normal
throught Higress CURL Request: curl 'http://log.gitee.work/assets/login/js/chunk-vendors.705a060b.js' -H 'Accept: /' -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8' -H 'Cache-Control: no-cache' -H 'Connection: keep-alive' -H 'Pragma: no-cache' -H 'Referer: http://log.gitee.work/login' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' --compressed --insecure throught nodeport CURL request: curl 'http://log.gitee.work:20571/assets/login/js/chunk-vendors.705a060b.js' -H 'Accept: /' -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8' -H 'Cache-Control: no-cache' -H 'Connection: keep-alive' -H 'Pragma: no-cache' -H 'Referer: http://log.gitee.work:20571/login' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' --compressed --insecure
I found the upstream host is
10.244.18.92:8080
in your log. Try this one: curl 'http://10.244.18.92:8080/assets/login/js/chunk-vendors.705a060b.js' -H 'Host: log.gitee.work' -H 'Accept: /' -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8' -H 'Cache-Control: no-cache' -H 'Connection: keep-alive' -H 'Pragma: no-cache' -H 'Referer: http://log.gitee.work:20571/login' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' --compressed --insecurein kubernete cluster ,curl response is normal
@johnlanni so i think it 's not the problem of nginx, i have check the pod , pod ip is 10.244.18.92, direct request can get normal response ,and even throught kubernetes nodeport svc also get normal response body that as well
@ray1888 Please run the tcpdump in higress-gateway's pod (maybe you should run this command on the node, or switch the pod user to root):
tcpdump -i any host 10.244.18.92 and port 8080 -A
Then access the js from browser, you will see the whole request headers in the output of tcpdump
.
Then try to curl 10.244.18.92:8080
with these request headers, and find that which header will let the nginx return 200 but without response body.
@ray1888 Please run the tcpdump in higress-gateway's pod (maybe you should run this command on the node, or switch the pod user to root):
tcpdump -i any host 10.244.18.92 and port 8080 -A
Then access the js from browser, you will see the whole request headers in the output of
tcpdump
.Then try to
curl 10.244.18.92:8080
with these request headers, and find that which header will let the nginx return 200 but without response body.
can i do the tcpdump at the gateway node? or only can do on the pod working node?
@johnlanni can this screenshot ok? or need i post with pcap file?
@ray1888 Please run the tcpdump in higress-gateway's pod (maybe you should run this command on the node, or switch the pod user to root):
tcpdump -i any host 10.244.18.92 and port 8080 -A
Then access the js from browser, you will see the whole request headers in the output of
tcpdump
.Then try to
curl 10.244.18.92:8080
with these request headers, and find that which header will let the nginx return 200 but without response body. @johnlanni i had tried to delete header one by one, and none of them effect to the response with curl to 10.244.18.92:8080 as dest
@johnlanni And i also try to curl in the higress gateway to dest svc js file, it also ok
but it still can't get from throught the higress to user browser response
@johnlanni can this screenshot ok? or need i post with pcap file?
Did you use curl try the headers of the output?
From the output of tcpdump, you can also find that nginx did not return the response body, which can prove that it was caused by nginx and not higress that discarded the response body.
@johnlanni can this screenshot ok? or need i post with pcap file?
Did you use curl try the headers of the output?
From the output of tcpdump, you can also find that nginx did not return the response body, which can prove that it was caused by nginx and not higress that discarded the response body.
yes, i have try that with curl to nginx , delete header one by one doesn't effect the response.
@johnlanni i have some new clue, i do curl from kubernetes node to hostname , the curl result are below i don't understand why it close the connection before the data response? this is traffic throught higress gateway
@ray1888 As you can see from the tcpdump output above, nginx did not return the response body. I think we need to find out the reason first.
@ray1888 As you can see from the tcpdump output above, nginx did not return the response body. I think we need to find out the reason first.
i directly curl from node will not close connectin before data tranfer finish. will it be request time over higress default timeout ?
@ray1888 As you can see from the tcpdump output above, nginx did not return the response body. I think we need to find
@ray1888 As you can see from the tcpdump output above, nginx did not return the response body. I think we need to find out the reason first.
i directly curl from node will not close connectin before data tranfer finish. from above ,it prove that , it not nginx problem? will it be request time over higress default timeout ?@johnlanni
@ray1888 The response code is 200, and if it times out, the response code is 504.
@ray1888 The response code is 200, and if it times out, the response code is 504.
can it remote on dingding to assist to help
@ray1888 可以的,你可以在钉钉社区交流群找到我,昵称是澄潭
@johnlanni 问了一下公司策略,不能远程访问,但是我把tcpdump 命令输出到文件里面,然后给导出来了。 response.zip 里面有较多的ReAck 和 Dup ack,而且比较奇怪的是,这里我理解应该代理的是7层的协议?但是tcpdump 出来的都只有4层的协议,到目标Pod 8080端口的那些请求和返回
而且从下图看得出来,是多次网关请求js时,重复ack,导致不断重传,重传多次失败后,网关侧主动发送RST 重置了链接导致的
网关 Pod 和 Nginx 之间都经过了什么额外的节点呢?
网关 Pod 和 Nginx 之间都经过了什么额外的节点呢? 网关Pod 在Node2上, Node1 为对应域名解析的节点
but i also try apisix for proxy with same config, they didn't show the same issue
@ray1888 我看了抓包有大量的TCP重传,包括建立连接时就出现了,这个问题难道只出现在js请求上?其他请求不受影响?,这是higress往后端发送的请求,你在higress gateway的pod里用curl命令测试一下看:
GET /assets/login/js/chunk-ff542364.2fa1ed4b.js HTTP/1.1
host: log.gitee.work
accept-encoding: deflate, gzip
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
accept-language: zh-CN,zh;q=0.9,en;q=0.8
cache-control: no-cache
pragma: no-cache
purpose: prefetch
referer: http://log.gitee.work/login
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
x-forwarded-for: 10.244.25.64
x-forwarded-proto: http
x-envoy-internal: true
x-request-id: 99ef8214-31bc-40a4-8062-802aa1051eef
x-envoy-decorator-operation: gitee-one-front.gitee.svc.cluster.local:80/assets/*
x-envoy-expected-rq-timeout-ms: 3000
x-envoy-attempt-count: 1
x-b3-traceid: 8ee55d9ddd763fc3b66f99e52a34a6d2
x-b3-spanid: b66f99e52a34a6d2
x-b3-sampled: 0
req-start-time: 1705976499465
original-host: log.gitee.work
@ray1888 我看了抓包有大量的TCP重传,包括建立连接时就出现了,这个问题难道只出现在js请求上?其他请求不受影响?,这是higress往后端发送的请求,你在higress gateway的pod里用curl命令测试一下看:
GET /assets/login/js/chunk-ff542364.2fa1ed4b.js HTTP/1.1 host: log.gitee.work accept-encoding: deflate, gzip accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7 accept-language: zh-CN,zh;q=0.9,en;q=0.8 cache-control: no-cache pragma: no-cache purpose: prefetch referer: http://log.gitee.work/login user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 x-forwarded-for: 10.244.25.64 x-forwarded-proto: http x-envoy-internal: true x-request-id: 99ef8214-31bc-40a4-8062-802aa1051eef x-envoy-decorator-operation: gitee-one-front.gitee.svc.cluster.local:80/assets/* x-envoy-expected-rq-timeout-ms: 3000 x-envoy-attempt-count: 1 x-b3-traceid: 8ee55d9ddd763fc3b66f99e52a34a6d2 x-b3-spanid: b66f99e52a34a6d2 x-b3-sampled: 0 req-start-time: 1705976499465 original-host: log.gitee.work
是的,只有JS有影响,其他的包括API转发都没问题
@ray1888 higress不会识别特定响应做处理,从抓包看,higress和后端之间从建连开始就一直有丢包重传,以及tcp包乱序等问题,跟你们的网络环境关系比较大(这个js响应比较大,可能触发了网络的问题),你可以尝试在你本地笔记本用kind部署一套higress,把原样的响应返回测试下,应该不会复现这个问题。
看了下这个不是正常的重传,SYN包没有等RTO(默认最小200毫秒)时间就重传了,只等了0.02毫秒,后面的数据包也是类似的情况,不是太久没收到ACK后才重传,而是一个包直接在网络上传2次。
你可以再抓一个浏览器和higress之间的包给我看下么,这个错误也很奇怪: net::ERR_INCOMPLETE_CHUNKED_ENCODING
你可以再抓一个浏览器和higress之间的包给我看下么,这个错误也很奇怪: net::ERR_INCOMPLETE_CHUNKED_ENCODING
我先试试。晚点我这边再部署一个Kind,然后通过Nodeport访问JS那个服务试试
你可以再抓一个浏览器和higress之间的包给我看下么,这个错误也很奇怪: net::ERR_INCOMPLETE_CHUNKED_ENCODING
看了下这个不是正常的重传,SYN包没有等RTO(默认最小200毫秒)时间就重传了,只等了0.02毫秒,后面的数据包也是类似的情况,不是太久没收到ACK后才重传,而是一个包直接在网络上传2次。
这里部分我刚刚看了一下啊,calico这边的RTO很小。tcpdump 出来的pod 是calico 上面的 pod ip,不确定是否会有影响呢?
是不是开启了WAF插件或者其他会buffer响应body的插件,最近有个用户遇到类似问题,是WAF插件导致的。
可以调高全局参数的 downstream.connectionBufferLimits 解决
WAF 插件会缓存请求 Body 和响应 Body,如果 Body 比全局配置中的 downstream.connectionBufferLimits 配置要大,会导致请求或响应异常
downstream.connectionBufferLimits 也不建议配置过大,可能导致网络传输慢时,网关内存占用过高
If you are reporting any crash or any potential security issue, do not open an issue in this repo. Please report the issue via ASRC(Alibaba Security Response Center) where the issue will be triaged appropriately.
Ⅰ. Issue Description
the problem is, when i trying higress as our team new ingress gateway, it proxy some of the Nginx's service js file return with empty file which size is 0
Ⅱ. Describe what happened
gateway log as above, and chrome console report
which refer to the request for js resource doesn't response with valid body, because of body_sent log is 0
Ⅲ. Describe what you expected to happen below pic is directly request throught svc nodeport, it response greatly and with no console error
Ⅳ. How to reproduce it (as minimally and precisely as possible)
Ⅴ. Anything else we need to know?
Ⅵ. Environment: