apache / apisix

The Cloud-Native API Gateway
https://apisix.apache.org/blog/
Apache License 2.0
14.5k stars 2.52k forks source link

help request: open keepalive_pool also occurs many three handshaking #7631

Closed jujiale closed 1 year ago

jujiale commented 2 years ago

Description

hello , I use apisix v2.12 do some test, I have installed apisix-base, but I meet some question,the following is my config. my upstream is a openresty(172.25.xxx.xxx) ,the openresty keepalive_timeout is 10s , when I visit http://172.25.xxx.xxx::9000/test, it could return a simple string. `

{
    "id": "414195977881125816",
    "create_time": 1656409641,
    "update_time": 1660009873,
    "uri": "/test",
    "name": "jujiale-test",
    "methods": ["GET", "POST", "PUT", "DELETE", "PATCH", "HEAD", "OPTIONS", "CONNECT", "TRACE"],
    "plugins": {
        "prometheus": {
            "disable": false,
            "prefer_name": true
        }
    },
    "upstream": {
        "nodes": {
            "172.25.xxx.xxx:9000": 1
        },
        "timeout": {
            "connect": 6,
            "send": 6,
            "read": 6
        },
        "type": "roundrobin",
        "schtheme": "http",
        "pass_host": "pass",
        "keepalive_pool": {
            "idle_timeout": "60s",
            "requests": 1000,
            "size": 320
        }
    },
    "status": 1
}

`

with above config , I use tcpdump do packet capture in apisix server, "sudo tcpdump -i any host 172.25.xxx.xxx and dst port 9000 -w 111.pcap". then I visit uri "/test" in apiisx with curl. the request sent to upstream OK. but when I use wireshark open 111.pcap, it appears many three-handshaking, then I cutdown idle_timeout from 60s to 5s in order to be smaller than 10s in openresty config, it also has many three-handshakings. I am confused of such thing.

I find the following source code is invoked when I visited uri "/test" source code: apisix/balancer.lua https://github.com/apache/apisix/blob/f118f5ea7a5d96023a7bd546545f7c1ad6486495/apisix/balancer.lua#L299 `

local idle_timeout = keepalive_pool.idle_timeout
            local size = keepalive_pool.size
            local requests = keepalive_pool.requests

            pool_opt.pool_size = size
            local ok, err = balancer.set_current_peer(server.host, server.port,
                                                      pool_opt)
            if not ok then
                return ok, err
            end

            return balancer.enable_keepalive(idle_timeout, requests)

` the code "balancer.enable_keepalive(idle_timeout, requests)" seems does not work

what I do above this is in apisix-dashboard to config keepalive_pool, then I remove keepalive_pool config from my route config, and use config-default.yaml nginx_config.http.upstream.keepalive_timeout is 60s, with the wireshark do examine, it also behaves like above what I discribe.

So I want to know, set idle_timeout 0 if could close keepalive, and set idle_timeout is not 0 if could open keepalive_pool

Environment

spacewander commented 2 years ago

@kingluo Would you like to reproduce this keepalive_pool issue? Thanks!

kingluo commented 2 years ago

@jujiale

  1. Where do you get the config? From admin API? Why the output is a bit weird?
        "type": "roundrobin",
        "schtheme": "http",
        "pass_host": "pass",
        "keepalive_pool": {
            "idle_timeout": "60s",
            "requests": 1000,
            "size": 320
        }

    The schtheme should be scheme, and the idle_timeout should be in integer format.

Could you get the route config from admin API?

For example:

curl -s "http://127.0.0.1:10080/apisix/admin/routes/test_keepalive" -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1" | python -m json.tool

{
    "action": "get",
    "count": 1,
    "node": {
        "key": "/apisix/routes/test_keepalive",
        "value": {
            "create_time": 1660044146,
            "id": "test_keepalive",
            "priority": 0,
            "status": 1,
            "update_time": 1660054060,
            "upstream": {
                "hash_on": "vars",
                "keepalive_pool": {
                    "idle_timeout": 60,
                    "requests": 1000,
                    "size": 320
                },
                "nodes": {
                    "54.144.64.232": 1
                },
                "pass_host": "pass",
                "scheme": "http",
                "timeout": {
                    "connect": 6,
                    "read": 6,
                    "send": 6
                },
                "type": "roundrobin"
            },
            "uri": "/get"
        }
    }
}
  1. Could you provide the pcap file (between apisix and upstream) for reference? Which side closes the connection? Check if the upstream server returns header Connection: keep-alive.

I try it in docker env (apache/apisix:2.12.0-centos), and I cannot reproduce your issue. (I also try to use httpbin.org as upstream, but no problem too).

So please give more infomation about your test flow.

jujiale commented 2 years ago

@kingluo thanks for your reply. the config idle_timeout=60s is in my test, because I find in config-default.yaml , the timeout has time unit , actually, I removed "s" from "60s". with apisix-dashboard,I set idle_timeout = 60 , it also has three-handshaking,I will provide more infromation tomorrow, thanks

tokers commented 2 years ago

@jujiale In your case, What about the QPS to this upstream?

jujiale commented 2 years ago

@jujiale In your case, What about the QPS to this upstream?

@tokers I just use curl to invoke. the qps is once in one or two second

tzssangglass commented 2 years ago

hi @jujiale in order to make it easier for others to reproduce, the following information is best provided accurately

  1. give the configuration of the route directly (how to add the route with curl, or how this route is configured in etcd)
  2. what the upstream(openresty) config?
  3. the full command to simulate a client sending a request with curl
  4. give the 111.pcap

And I would also like to ask, is 9000 the port you set for APISIX proxy client requests?

jujiale commented 2 years ago

@tzssangglass thanks for your remind. @kingluo hello, I just use httpbin to have a test 1.the following is my config in etcd, `

{
    "id": "420379229796959160",
    "create_time": 1660095146,
    "update_time": 1660096960,
    "uri": "/get",
    "name": "httpbin_test",
    "methods": ["GET", "POST", "PUT", "DELETE", "PATCH", "HEAD", "OPTIONS", "CONNECT", "TRACE"],
    "upstream": {
        "nodes": {
            "httpbin.org:80": 1
        },
        "timeout": {
            "connect": 6,
            "send": 6,
            "read": 6
        },
        "type": "chash",
        "hash_on": "vars",
        "key": "remote_addr",
        "scheme": "http",
        "pass_host": "pass",
        "keepalive_pool": {
            "idle_timeout": 60,
            "requests": 1000,
            "size": 1
        }
    },
    "status": 1
}

`

  1. I use command "sudo tcpdump -i any host httpbin.org -w 555555.pcap" to capture

3.I use "curl http://my_apisix_server_ip:port/get" to send a request.

4.the following picutre is the packet image

if the information is not enough, I will add more details. thanks

jujiale commented 2 years ago

also, not matter what I use chash or roundrobin, or set keepalive_pool.size > 1(such as set it 320 as default),there is also have three handshakings like the picture before

kingluo commented 2 years ago

@jujiale When you use domain name in upstream config, each request to upstream may use different resolved ip address. Just like what you show in the pcap, apisix connects to different ip address of httpbin.org each time. FYI, apisix creates seperate keepalive pool for each distinct address+port combination. Could you use ip address in upstream config, e.g. 54.144.64.232 and retry again?

jujiale commented 2 years ago

@kingluo I use your suggestion to test

  1. my config in etcd `

    {
        "id": "420379229796959160",
        "create_time": 1660095146,
        "update_time": 1660098809,
        "uri": "/get",
        "name": "httpbin_test",
        "methods": ["GET", "POST", "PUT", "DELETE", "PATCH", "HEAD", "OPTIONS", "CONNECT", "TRACE"],
        "upstream": {
            "nodes": {
                "54.144.64.232:80": 1
            },
            "timeout": {
                "connect": 6,
                "send": 6,
                "read": 6
            },
            "type": "chash",
            "hash_on": "vars",
            "key": "remote_addr",
            "scheme": "http",
            "pass_host": "pass",
            "keepalive_pool": {
                "idle_timeout": 60,
                "requests": 1000,
                "size": 1
            }
        },
        "status": 1
    }

`

  1. I use command "sudo tcpdump -i any host 54.144.64.232 -w 555555.pcap" to capture

3.I use "curl http://my_apisix_server_ip:port/get" to send a request.

  1. the following is packet picture, as you see, I send 4 requests, but there occurs two "three-handshakings" image
jujiale commented 2 years ago

@kingluo what I find strange is that the request header if has "Connection:keep-alive" or not. it make effect of the test result. such as set idle_timeout=0,

1.my etcd config: `

    {
        "id": "420379229796959160",
        "create_time": 1660095146,
        "update_time": 1660116335,
        "uri": "/get",
        "name": "httpbin_test",
        "methods": ["GET", "POST", "PUT", "DELETE", "PATCH", "HEAD", "OPTIONS", "CONNECT", "TRACE"],
        "upstream": {
            "nodes": {
                "54.144.64.232:80": 1
            },
            "timeout": {
                "connect": 6,
                "send": 6,
                "read": 6
            },
            "type": "chash",
            "hash_on": "vars",
            "key": "remote_addr",
            "scheme": "http",
            "pass_host": "pass",
            "keepalive_pool": {
                "idle_timeout": 0,
                "requests": 1000,
                "size": 1
            }
        },
        "status": 1
    }

` 2.I use postman to send request to my apisix server in order to control the request header conveniently.

3.the request header if not has "Connection:keep-alive" the result like below: image

4.if request header has "Connection:keep-alive" the result like below: image

kingluo commented 2 years ago

@jujiale The Connection:keep-alive should be set in response header but not request header. Could you show the HTTP GET content and response content from your wireshark?

jujiale commented 2 years ago

should be set in response header but not request header. Could you show the HTTP GET content and response content from your

@kingluo image

kingluo commented 2 years ago

@jujiale That's not sent to apisix, check the Server header. And, the Connection: keep-alive is in response header, so it doesn't matter whether you use postman or not. For http 1.1, the connection would keep alived accordding to the response. And keepalive is normally enabled by default for http 1.1.

kingluo commented 2 years ago

@jujiale Could you just show what's differences between your correct pcap and wrong pcap (from your picutures)? Check if the request and/or response is different.

jujiale commented 2 years ago

@jujiale Could you just show what's differences between your correct pcap and wrong pcap (from your picutures)? Check if the request and/or response is different.

@kingluo

the following is that the correct pcap(idle_timeout=0),but I send 4 request ,just has 3 three-handshakings image

the following is that the wrong pcap(idle_timeout=0), it seems also a persistent connection image

kingluo commented 2 years ago

@jujiale Why the server is gunicorn but not apisix? And why you set idle_timeout=0? It's different from what you said in https://github.com/apache/apisix/issues/7631#issuecomment-1210085555, could you make the test configuration consistent between your posts?

jujiale commented 2 years ago

@kingluo

sorry,I need to test keepalive function in apisix, so I need to test other situation, I said in https://github.com/apache/apisix/issues/7631#issuecomment-1210085555 is just a one situation that could not as my expect, I guess perhaps I mentioned above could provide other details, perhaps they are the same problem caused. thanks

tokers commented 2 years ago

@jujiale The Connection:keep-alive should be set in response header but not request header.

Could you show the HTTP GET content and response content from your wireshark?

If connection header is missing, the connection keepalive is enabled by default in http/1.1.

tzssangglass commented 2 years ago

what I find strange is that the request header if has "Connection:keep-alive" or not. it make effect of the test result. such as set idle_timeout=0,

set idle_timeout to 0 is equal to set keepalive_timeout to 0, have the same effect as close keepalive. ref: https://github.com/apache/apisix/issues/6188#issuecomment-1020751797

tzssangglass commented 2 years ago

sorry,I need to test keepalive function in apisix, so I need to test other situation, I said in #7631 (comment) is just a one situation that could not as my expect Instead of providing information on multiple cases, I think we should focus on: https://github.com/apache/apisix/issues/7631#issuecomment-1210085555

It should be clear that in the case of https://github.com/apache/apisix/issues/7631#issuecomment-1210085555, the request path is curl --> APISIX --> 54.144.64.232:80? Not related to gunicorn?

If in the case of https://github.com/apache/apisix/issues/7631#issuecomment-1210085555, if the request path is not curl --> APISIX --> 54.144.64.232:80, and if gunicorn appears in request path, such as curl --> gunicorn --> APISIX --> 54.144.64.232:80 or curl --> APISIX --> gunicorn --> 54.144.64.232:80 or curl --> gunicorn --> 54.144.64.232:80, and you should explain it clearly(I'm just worried that this will happen and cause a message gap in our communication).

tzssangglass commented 2 years ago

I guess perhaps I mentioned above could provide other details, perhaps they are the same problem caused

In fact, we are currently only interested in the case of curl --> APISIX --> 54.144.64.232:80, whether abnormal multiple handshakes occur with http1.1 and keepalive_pool enabled by default, which means that keepalive_pool is not working and is an abnormal behavior. If it does exist, we need to fix it.

github-actions[bot] commented 1 year ago

This issue has been marked as stale due to 350 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@apisix.apache.org list. Thank you for your contributions.

github-actions[bot] commented 1 year ago

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.