F-Stack / f-stack

F-Stack is an user space network development kit with high performance based on DPDK, FreeBSD TCP/IP stack and coroutine API.
http://www.f-stack.org
Other
3.82k stars 891 forks source link

Multi-core single-interface nginx reverse proxy, wrk does not work properly #380

Open tyheist opened 5 years ago

tyheist commented 5 years ago

issue #62 @whl739 said f-stack nginx support reverse proxy. I use version 1.11/1.12 and master/dev branch to test, the result is same -- my configure f-stack.conf.txt nginx.conf.txt

tyheist commented 5 years ago

f-stack.conf

[dpdk]
## Hexadecimal bitmask of cores to run on.
## 2 lcore  14,15
lcore_mask=0xc000
channel=4
promiscuous=1
numa_on=1
## TCP segment offload, default: disabled.
tso=0
## HW vlan strip, default: enabled.
vlan_strip=1

port_list=0

## Port config section
## Correspond to dpdk.port_list's index: port0, port1...
[port0]
addr=1.1.1.2
netmask=255.255.255.0
broadcast=1.1.1.255
gateway=1.1.1.1

## lcore list used to handle this port
## the format is same as port_list
lcore_list=14,15

## Packet capture path, this will hurt performance
#pcap=./a.pcap
#pcap=/home/ty/tcp6.pcap

## Kni config: if enabled and method=reject,
## all packets that do not belong to the following tcp_port and udp_port
## will transmit to kernel; if method=accept, all packets that belong to
## the following tcp_port and udp_port will transmit to kernel.
#[kni]
#enable=1
#method=reject
## The format is same as port_list
#tcp_port=80,443
#udp_port=53

## FreeBSD network performance tuning configurations.
## Most native FreeBSD configurations are supported.
[freebsd.boot]
hz=100

## Block out a range of descriptors to avoid overlap
## with the kernel's descriptor space.
## You can increase this value according to your app.
fd_reserve=1024

kern.ipc.maxsockets=262144

net.inet.tcp.syncache.hashsize=4096
net.inet.tcp.syncache.bucketlimit=100

net.inet.tcp.tcbhashsize=65536

kern.ncallout=262144

[freebsd.sysctl]
kern.ipc.somaxconn=32768
kern.ipc.maxsockbuf=16777216

net.link.ether.inet.maxhold=5

net.inet.tcp.fast_finwait2_recycle=1
net.inet.tcp.sendspace=16384
net.inet.tcp.recvspace=8192
net.inet.tcp.nolocaltimewait=1
net.inet.tcp.cc.algorithm=cubic
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendbuf_auto=1
net.inet.tcp.recvbuf_auto=1
net.inet.tcp.sendbuf_inc=16384
net.inet.tcp.recvbuf_inc=524288
net.inet.tcp.sack.enable=1
net.inet.tcp.blackhole=1
net.inet.tcp.msl=2000
net.inet.tcp.delayed_ack=0

net.inet.udp.blackhole=1
net.inet.ip.redirect=0

nginx.conf

# root account is necessary.
user  root;
# should be equal to the lcore count of `dpdk.lcore_mask` in f-stack.conf.
## 2 lcore
worker_processes  2;
# path of f-stack configuration file, default: $NGX_PREFIX/conf/f-stack.conf.
fstack_conf f-stack.conf;

events {
    worker_connections  102400;
    use kqueue;
}

daemon off;

http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile        off;
       #keepalive_timeout  0;
    keepalive_timeout  65;

    #gzip  on;

    server {
        listen 99;
        location /1byte{
            access_log off;
            proxy_pass http://1.1.1.3/1byte;
        }
    }
}
tyheist commented 5 years ago

I changed code, move the code which the capture packet for port, to capture each queue.

ff_dpdk_if.c  init_lcore_conf(void)
 338         printf("lcore: %u, port: %u, queue: %u\n", lcore_id, port_id, queueid);
 339         uint16_t nb_rx_queue = lcore_conf.nb_rx_queue;
 340         lcore_conf.rx_queue_list[nb_rx_queue].port_id = port_id;
 341         lcore_conf.rx_queue_list[nb_rx_queue].queue_id = queueid;
 342         lcore_conf.nb_rx_queue++;
 343 
 344         lcore_conf.tx_queue_id[port_id] = queueid;
 345         lcore_conf.tx_port_id[lcore_conf.nb_tx_port] = port_id;
 346         lcore_conf.nb_tx_port++;
 347 
 348 /** capture packet each queue */
 349         char tmp[128] = {0};
 350         sprintf(tmp, "%s-%d.pcap", pconf->pcap, queueid);
 351         lcore_conf.pcap[port_id] = strdup(tmp);
 352         ff_enable_pcap(lcore_conf.pcap[port_id]);
 353         //lcore_conf.pcap[port_id] = pconf->pcap;
 354         lcore_conf.nb_queue_list[port_id] = pconf->nb_lcores;
 355     }
 356     
 357     if (lcore_conf.nb_rx_queue == 0) {
 358         rte_exit(EXIT_FAILURE, "lcore %u has nothing to do\n", lcore_id);
 359     }
 360     
 361     return 0;
 362 }

I found f-stack nginx send SYN to back-end send with queue 1 and recv SYN/ACK from back-end with queue 0

-- netstat check status, confirm SYN send with queue 1

root@ty:/home/ty/code/f-stack/tools/sbin# ./netstat -an -P 1
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)
tcp4       0      0 1.1.1.2.15754          1.1.1.3.80             SYN_SENT
tcp4       0      0 1.1.1.2.99             1.1.1.4.54206          ESTABLISHED
tcp4       0      0 *.99                   *.*                    LISTEN
tcp4       0      0 *.80                   *.*                    LISTEN
udp4       0      0 *.*                    *.* 

-- queue 0 pcap queue-0

-- queue 1 pcap queue-1


f-stack using ff_rss_check(), select a sport as one of the hash elements, and check whether the hash value is in the current queue. so i don't know where is wrong


by the way, i had test NIC intel I210/I350/X710. when using I210/I350, each request by curl is ok, but wrk test is not work. when using X710, curl will fail after several requests

whl739 commented 5 years ago

When connecting to a remote side, f-stack uses ff_rss_check to check and select a port that packets can be received in this queue. But it seemed that there's something wrong in ff_rss_check with these nics.

tyheist commented 5 years ago

i print input rss hash through struct rte_mbuf.hash.rss, and print output rss hash in ff_rss_check(), found the ouput rss hash difference with input rss hash.

check <Intel Ethernet Controller X710/XXV710/XL710 Datasheet>, section 7.1.10 said x710/xxv710/xl710 supports Microsoft* Toepliztz based hash and simple hash, Seclection between the two schemes is controlled by the global CFLQF_CTL register HTOEP.

Section 10.2.2.19.21 HTOEP

i guest my nic HTOEP is 0 not 1

tyheist commented 5 years ago

i do some test with X710. according to x710 datasheet, hash key is 52bytes, so i use datasheet section 7.1.10.1.2 hash key. X710 input rss hash and output rss hash is same now.

But when i use wrk to test, the f-stack nginx (reverse proxy) performance is poor. It's like using nic I350, curl test ok but wrk poor

tyheist commented 5 years ago

update, -- nic I350/I210 input rss hash and output rss hash is same -- nic X710 use 52bytes hash key, input rss hash / output rss hash is same

now, the issuse is wrk test, runing mutil-lcore the performance is poor, but runing one lcore the performance is perfect.

whl739 commented 5 years ago

F-Stack uses 40 bytes hash key by default, this may be the point. You can use the blow modifications to do some tests.

// Intel's i40e PMD default RSS key
static const rss_key_type default_rsskey_52bytes = {
    0x44, 0x39, 0x79, 0x6b, 0xb5, 0x4c, 0x50, 0x23,
    0xb6, 0x75, 0xea, 0x5b, 0x12, 0x4f, 0x9f, 0x30,
    0xb8, 0xa2, 0xc0, 0x3d, 0xdf, 0xdc, 0x4d, 0x02,
    0xa0, 0x8c, 0x9b, 0x33, 0x4a, 0xf6, 0x4a, 0x4c,
    0x05, 0xc6, 0xfa, 0x34, 0x39, 0x58, 0xd8, 0x55,
    0x7d, 0x99, 0x58, 0x3a, 0xe1, 0x38, 0xc9, 0x2e,
    0x81, 0x15, 0x03, 0x66
};

// in function init_port_start,  nearly line 660. 
if (dev_info.hash_key_size == 52) {
    port_conf.rx_adv_conf.rss_conf.rss_key = default_rsskey_52bytes;
    port_conf.rx_adv_conf.rss_conf.rss_key_len = 52;
} else {
    port_conf.rx_adv_conf.rss_conf.rss_key = default_rsskey_40bytes;
    port_conf.rx_adv_conf.rss_conf.rss_key_len = 40;
}
tyheist commented 5 years ago

i had test, nic X710 use 52bytes hash key, input rss hash / output rss hash is same.

the new issue is, when f-statck nginx run as reverse proxy and connect to backend with short connection, wrk run result is poor only 1000+ req/sec. when connect to backend with long connection, wrk run result is 130000+ req/sec

i guest above phenomena not related to hash value


by the way, i had do some performance with f-stack nginx and openresty nginx run as webserver.

-- one lcore openresty nginx 338087req/sec and f-stack nginx 337140 req/sec -- two lcore openresty nginx 449649 req/sec and f-stack nginx 337185 req/sec

zhanghaisen commented 5 years ago

I think,the root cause is that DPDK i40e driver disable the default ATR mode. I encountered the same issue on XL710 NIC. But the nginx worked OK as reverse proxy on X520 NIC. nginx initiatively connect to the real server, RSS unable to assign the packets with same session back to right original queue. This need the ATR mode in NIC hardware. https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/intel-ethernet-flow-director.pdf says,"ATR, the default mode, implements an algorithm that samples transmit traffic and learns to send receive traffic with the corresponding header information(source and destination reversed) to the core where the transmitted data came from."

tyheist commented 5 years ago

@zhanghaisen Thank you for your reply, i will continue to test

tyheist commented 5 years ago

i test with nic intel 82599ES, it work fine run as reverse proxy.