F-Stack / f-stack

F-Stack is an user space network development kit with high performance based on DPDK, FreeBSD TCP/IP stack and coroutine API.
http://www.f-stack.org
Other
3.87k stars 898 forks source link

TCP socket - based server fails to receive SYN packets from client #548

Open chenxiang2019 opened 4 years ago

chenxiang2019 commented 4 years ago

Hi all,

I tried to implement a simple TCP server based on the socket-like APIs provided by f-stack. However, the TCP server cannot normally process SYN packets sent by a client.

Here is the detailed information:

1. Testbed

I established the following testbed, which consists of two servers, S1 and S2. S1 and S2 are directly connected. Each server has a two-port 40 Gbps Intel NIC and runs Ubuntu 16.04 system. The two NIC ports are binded to DPDK igb_uio drivers.

S1 (running the client program, 10.0.0.25) <----> S2 (running the server program, 10.0.0.26)

The server uses 10.0.0.26 and 23456 as its IP address and port, respectively.

2. Server

I wrote the following code with respect to the example given by this repository.

#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <errno.h>
#include <assert.h>

#include <sys/socket.h>
#include <netinet/ip.h>
#include <sys/epoll.h>
#include <sys/ioctl.h>

#include "common.h"

#define MAX_EVENTS 1024

/* kevent set */
struct kevent kevSet;
/* events */
struct kevent events[MAX_EVENTS];
/* kq */
int kq;
int sockfd;

char html[] = "HTTP/1.1 200 OK\r\n";

int create_socket_server()
{
    struct sockaddr_in server;

    int socket_desc = ff_socket(AF_INET, SOCK_STREAM, 0);
    if (socket_desc == -1) {
        perror("Error: ff_socket failed.\n");
        exit(1);
    }

    server.sin_family = AF_INET;
    server.sin_addr.s_addr = inet_addr("10.0.0.26");//INADDR_ANY;
    server.sin_port = htons( 23456 );

    if (ff_bind(socket_desc, (const struct linux_sockaddr *)&server, sizeof(server)) < 0) {
        perror("Error: ff_bind failed.\n");
        exit(1);
    }

    return socket_desc;
}

int server_loop(void *arg)
{
    char client_message[1024] = "server response: this is f-stack!";

    /* Wait for events to happen */
    unsigned nevents = ff_kevent(kq, NULL, 0, events, MAX_EVENTS, NULL);

    for (int i = 0; i < nevents; i++) {
        struct kevent event = events[i];
        int clientfd = (int)event.ident;

        /* Handle disconnect */
        if (event.flags & EV_EOF) {

            /* Simply close socket */
            ff_close(clientfd);

        } else if (clientfd == sockfd) {

            int available = (int)event.data;

            while (available) {

                int client_sock = ff_accept(sockfd, NULL, NULL);
                if (client_sock < 0) {
                    log_warn("Error: ff_accept failed.\n");
                    exit(1);
                }

                int read_size = 0;
                while ( (read_size = ff_read(client_sock, client_message, sizeof(client_message))) > 0 ) {
                    ff_write(client_sock, client_message, strlen(client_message));
                }

                /* Add to event list */
                EV_SET(&kevSet, client_sock, EVFILT_READ, EV_ADD, 0, 0, NULL);

                if (ff_kevent(kq, &kevSet, 1, NULL, 0, NULL) < 0) {
                    printf("ff_kevent error:%d, %s\n", errno,
                        strerror(errno));
                    return -1;
                }

                available--;
            }            

        } else if (event.filter == EVFILT_READ) {

            char buf[256];
            size_t readlen = ff_read(clientfd, buf, sizeof(buf));

            ff_write(clientfd, html, sizeof(html) - 1);

        } else {

            printf("unknown event: %8.8X\n", event.flags);

        }
    }
}

int main(int argc, char *argv[])
{
    ff_init(argc, argv);

    assert((kq = ff_kqueue()) > 0);

    log_warn("Init socket.\n");

    sockfd = create_socket_server();

    int on = 1;
    ff_ioctl(sockfd, FIONBIO, &on);

    int ret = ff_listen(sockfd, MAX_EVENTS);
    if (ret < 0) {
        log_warn("ff_listen failed\n");
        exit(1);
    }

    EV_SET(&kevSet, sockfd, EVFILT_READ, EV_ADD, 0, MAX_EVENTS, NULL);
    /* Update kqueue */
    ff_kevent(kq, &kevSet, 1, NULL, 0, NULL);

    log_warn("Sockfd %d Wait for a Client...\n", sockfd);

    log_warn("Ready to ff_run.\n");

    ff_run(server_loop, NULL);

    return 0;
}

I use the following config.ini when starting the server:

[dpdk]
# Hexadecimal bitmask of cores to run on.
lcore_mask=3

# Number of memory channels.
channel=8

# Specify base virtual address to map.
#base_virtaddr=0x7f0000000000

# Promiscuous mode of nic, defualt: enabled.
promiscuous=1
numa_on=0

# TX checksum offload skip, default: disabled.
# We need this switch enabled in the following cases:
# -> The application want to enforce wrong checksum for testing purposes
# -> Some cards advertize the offload capability. However, doesnt calculate checksum.
tx_csum_offoad_skip=0

# TCP segment offload, default: disabled.
tso=0

# HW vlan strip, default: enabled.
vlan_strip=1

# sleep when no pkts incomming
# unit: microseconds
idle_sleep=0

# sent packet delay time(0-100) while send less than 32 pkts.
# default 100 us.
# if set 0, means send pkts immediately.
# if set >100, will dealy 100 us.
# unit: microseconds
pkt_tx_delay=0

# enabled port list
#
# EBNF grammar:
#
#    exp      ::= num_list {"," num_list}
#    num_list ::= <num> | <range>
#    range    ::= <num>"-"<num>
#    num      ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
#
# examples
#    0-3       ports 0, 1,2,3 are enabled
#    1-3,4,7   ports 1,2,3,4,7 are enabled
#
# If use bonding, shoule config the bonding port id in port_list
# and not config slave port id in port_list
# such as, port 0 and port 1 trank to a bonding port 2,
# should set `port_list=2` and config `[port2]` section
port_list=0,1

# Number of vdev.
nb_vdev=0

# Number of bond.
nb_bond=0

# Each core write into own pcap file, which is open one time, close one time if enough.
# Support dump the first snaplen bytes of each packet.
# if pcap file is lager than savelen bytes, it will be closed and next file was dumped into.
[pcap]
enable = 0
snaplen= 16777216
savelen= 16777216

# Port config section
# Correspond to dpdk.port_list s index: port0, port1...
[port0]
addr=10.0.0.26
netmask=255.255.255.0
broadcast=10.0.0.255
gateway=10.0.0.254

[port1]
addr=10.0.0.27
netmask=255.255.255.0
broadcast=10.0.0.255
gateway=10.0.0.254

# lcore list used to handle this port
# the format is same as port_list
lcore_list=0,1

# bonding slave port list used to handle this port
# need to config while this port is a bonding port
# the format is same as port_list
#slave_port_list=0,1

# Packet capture path, this will hurt performance
pcap=./ports.pcap

# Kni config: if enabled and method=reject,
# all packets that do not belong to the following tcp_port and udp_port
# will transmit to kernel; if method=accept, all packets that belong to
# the following tcp_port and udp_port will transmit to kernel.
[kni]
enable=0
method=reject
# The format is same as port_list
tcp_port=1-65535
udp_port=1-65535

# FreeBSD network performance tuning configurations.
# Most native FreeBSD configurations are supported.
[freebsd.boot]
hz=100

# Block out a range of descriptors to avoid overlap
# with the kernel s descriptor space.
# You can increase this value according to your app.
fd_reserve=1024

kern.ipc.maxsockets=262144

net.inet.tcp.syncache.hashsize=4096
net.inet.tcp.syncache.bucketlimit=100

net.inet.tcp.tcbhashsize=65536

kern.ncallout=262144

kern.features.inet6=1
net.inet6.ip6.auto_linklocal=1
net.inet6.ip6.accept_rtadv=2
net.inet6.icmp6.rediraccept=1
net.inet6.ip6.forwarding=0

[freebsd.sysctl]
kern.ipc.somaxconn=32768
kern.ipc.maxsockbuf=16777216

net.link.ether.inet.maxhold=5

net.inet.tcp.fast_finwait2_recycle=1
net.inet.tcp.sendspace=16384
net.inet.tcp.recvspace=8192
#net.inet.tcp.nolocaltimewait=1
net.inet.tcp.cc.algorithm=cubic
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendbuf_auto=1
net.inet.tcp.recvbuf_auto=1
net.inet.tcp.sendbuf_inc=16384
net.inet.tcp.recvbuf_inc=524288
net.inet.tcp.sack.enable=1
net.inet.tcp.blackhole=1
net.inet.tcp.msl=2000
net.inet.tcp.delayed_ack=0
net.inet.udp.blackhole=1
net.inet.ip.redirect=0
net.inet.ip.forwarding=0

3. Client

I wrote a simple client based on Python to inject requests to the server:

#!/usr/bin/env python3

import socket
import time

HOST = '10.0.0.26'
PORT = 23456            # The port used by the server

LIMIT = 1000000
cnt = 0

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.connect((HOST, PORT))
    start_time = time.time()
    results = []

    while True:
        cur_time = time.time()
        s.sendall(b'Hello, world')
        data = s.recv(1024)
        results.append(time.time()-cur_time)
        cnt = cnt+1
        if cnt > LIMIT : break
    print("--- %s seconds ---" % (time.time() - start_time))

4. Running

First, I start the server in S2, which prints the following logs:

invalid proc_id:-1, use default 0
[dpdk]: lcore_mask=3
[dpdk]: channel=8
[dpdk]: promiscuous=1
[dpdk]: numa_on=1
[dpdk]: tx_csum_offoad_skip=0
[dpdk]: tso=0
[dpdk]: vlan_strip=1
[dpdk]: idle_sleep=0
[dpdk]: pkt_tx_delay=0
[dpdk]: port_list=0,1
[dpdk]: nb_vdev=0
[dpdk]: nb_bond=0
[pcap]: enable=0
[pcap]: snaplen=16777216
[pcap]: savelen=16777216
[port0]: addr=10.0.0.26
[port0]: netmask=255.255.255.0
[port0]: broadcast=10.0.0.255
[port0]: gateway=10.0.0.254
[port1]: addr=10.0.0.27
[port1]: netmask=255.255.255.0
[port1]: broadcast=10.0.0.255
[port1]: gateway=10.0.0.254
[port1]: lcore_list=0,1
[kni]: enable=0
[kni]: method=reject
[kni]: tcp_port=1-65535
[kni]: udp_port=1-65535
[freebsd.boot]: hz=100
[freebsd.boot]: fd_reserve=1024
[freebsd.boot]: kern.ipc.maxsockets=262144
[freebsd.boot]: net.inet.tcp.syncache.hashsize=4096
[freebsd.boot]: net.inet.tcp.syncache.bucketlimit=100
[freebsd.boot]: net.inet.tcp.tcbhashsize=65536
[freebsd.boot]: kern.ncallout=262144
[freebsd.boot]: kern.features.inet6=1
[freebsd.boot]: net.inet6.ip6.auto_linklocal=1
[freebsd.boot]: net.inet6.ip6.accept_rtadv=2
[freebsd.boot]: net.inet6.icmp6.rediraccept=1
[freebsd.boot]: net.inet6.ip6.forwarding=0
[freebsd.sysctl]: kern.ipc.somaxconn=32768
[freebsd.sysctl]: kern.ipc.maxsockbuf=16777216
[freebsd.sysctl]: net.link.ether.inet.maxhold=5
[freebsd.sysctl]: net.inet.tcp.fast_finwait2_recycle=1
[freebsd.sysctl]: net.inet.tcp.sendspace=16384
[freebsd.sysctl]: net.inet.tcp.recvspace=8192
[freebsd.sysctl]: net.inet.tcp.cc.algorithm=cubic
[freebsd.sysctl]: net.inet.tcp.sendbuf_max=16777216
[freebsd.sysctl]: net.inet.tcp.recvbuf_max=16777216
[freebsd.sysctl]: net.inet.tcp.sendbuf_auto=1
[freebsd.sysctl]: net.inet.tcp.recvbuf_auto=1
[freebsd.sysctl]: net.inet.tcp.sendbuf_inc=16384
[freebsd.sysctl]: net.inet.tcp.recvbuf_inc=524288
[freebsd.sysctl]: net.inet.tcp.sack.enable=1
[freebsd.sysctl]: net.inet.tcp.blackhole=1
[freebsd.sysctl]: net.inet.tcp.msl=2000
[freebsd.sysctl]: net.inet.tcp.delayed_ack=0
[freebsd.sysctl]: net.inet.udp.blackhole=1
[freebsd.sysctl]: net.inet.ip.redirect=0
[freebsd.sysctl]: net.inet.ip.forwarding=0
EAL: Detected 48 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: PRIMARY
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available hugepages reported in hugepages-1048576kB
f-stack -c1 -n8 --proc-type=auto EAL: Probing VFIO support...
EAL: PCI device 0000:00:04.0 on NUMA socket 0
EAL:   probe driver: 8086:2021 rawdev_ioat
EAL: PCI device 0000:00:04.1 on NUMA socket 0
EAL:   probe driver: 8086:2021 rawdev_ioat
EAL: PCI device 0000:00:04.2 on NUMA socket 0
EAL:   probe driver: 8086:2021 rawdev_ioat
EAL: PCI device 0000:00:04.3 on NUMA socket 0
EAL:   probe driver: 8086:2021 rawdev_ioat
EAL: PCI device 0000:00:04.4 on NUMA socket 0
EAL:   probe driver: 8086:2021 rawdev_ioat
EAL: PCI device 0000:00:04.5 on NUMA socket 0
EAL:   probe driver: 8086:2021 rawdev_ioat
EAL: PCI device 0000:00:04.6 on NUMA socket 0
EAL:   probe driver: 8086:2021 rawdev_ioat
EAL: PCI device 0000:00:04.7 on NUMA socket 0
EAL:   probe driver: 8086:2021 rawdev_ioat
EAL: PCI device 0000:1a:00.0 on NUMA socket 0
EAL:   probe driver: 8086:37d0 net_i40e
EAL: PCI device 0000:1a:00.1 on NUMA socket 0
EAL:   probe driver: 8086:37d0 net_i40e
EAL: PCI device 0000:1a:00.2 on NUMA socket 0
EAL:   probe driver: 8086:37d1 net_i40e
EAL: PCI device 0000:1a:00.3 on NUMA socket 0
EAL:   probe driver: 8086:37d1 net_i40e
EAL: PCI device 0000:3b:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1583 net_i40e
EAL: PCI device 0000:3b:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1583 net_i40e
EAL: PCI device 0000:80:04.0 on NUMA socket 1
EAL:   probe driver: 8086:2021 rawdev_ioat
EAL: PCI device 0000:80:04.1 on NUMA socket 1
EAL:   probe driver: 8086:2021 rawdev_ioat
EAL: PCI device 0000:80:04.2 on NUMA socket 1
EAL:   probe driver: 8086:2021 rawdev_ioat
EAL: PCI device 0000:80:04.3 on NUMA socket 1
EAL:   probe driver: 8086:2021 rawdev_ioat
EAL: PCI device 0000:80:04.4 on NUMA socket 1
EAL:   probe driver: 8086:2021 rawdev_ioat
EAL: PCI device 0000:80:04.5 on NUMA socket 1
EAL:   probe driver: 8086:2021 rawdev_ioat
EAL: PCI device 0000:80:04.6 on NUMA socket 1
EAL:   probe driver: 8086:2021 rawdev_ioat
EAL: PCI device 0000:80:04.7 on NUMA socket 1
EAL:   probe driver: 8086:2021 rawdev_ioat
lcore: 0, port: 0, queue: 0
lcore: 0, port: 1, queue: 0
create mbuf pool on socket 0
create ring:dispatch_ring_p0_q0 success, 2047 ring entries are now free!
create ring:dispatch_ring_p0_q1 success, 2047 ring entries are now free!
create ring:dispatch_ring_p1_q0 success, 2047 ring entries are now free!
create ring:dispatch_ring_p1_q1 success, 2047 ring entries are now free!
Port 0 MAC: 9c 69 b4 60 35 24
Port 0 modified RSS hash function based on hardware support,requested:0x3ffffc configured:0x7ef8
RX checksum offload supported
TX ip checksum offload supported
TX TCP&UDP checksum offload supported
TSO is disabled
port[0]: rss table size: 512
set port 0 to promiscuous mode ok
Port 1 MAC: 9c 69 b4 60 35 25
Port 1 modified RSS hash function based on hardware support,requested:0x3ffffc configured:0x7ef8
RX checksum offload supported
TX ip checksum offload supported
TX TCP&UDP checksum offload supported
TSO is disabled
port[1]: rss table size: 512
set port 1 to promiscuous mode ok

Checking link statusdone
Port 0 Link Up - speed 40000 Mbps - full-duplex
Port 1 Link Up - speed 40000 Mbps - full-duplex
link_elf_lookup_symbol: missing symbol hash table
link_elf_lookup_symbol: missing symbol hash table
Timecounters tick every 10.000 msec
Timecounter "ff_clock" frequency 100 Hz quality 1
f-stack-0: Ethernet address: 9c:69:b4:60:35:24
f-stack-1: Ethernet address: 9c:69:b4:60:35:25
ff_veth_set_gateway failed
 [WARN][server.c:main:112]: Init socket.
 [WARN][server.c:main:130]: Sockfd 1024 Wait for a Client...
 [WARN][server.c:main:132]: Ready to ff_run.

Next, I execute the client program in S1. I found that the client continuously sends SYN requests to the server. What I expected is that the server accepts the connection requests issued by the client and responses data to the client.

However, the server seems to never capture any events from NIC drivers. Thus, it neither accepts the connection requests or produces "ports.pcap" file in the target directory.

This is weird and I have no idea about this situation. Could you help me with this problem? Any suggestions or comments will be appreciated.

jfb8856606 commented 4 years ago

Each core write into own pcap file, which is open one time, close one time if enough.

Support dump the first snaplen bytes of each packet.

if pcap file is lager than savelen bytes, it will be closed and next file was dumped into.

[pcap] enable = 0 snaplen= 16777216 savelen= 16777216

Port config section

Correspond to dpdk.port_list s index: port0, port1...

[port0] addr=10.0.0.26 netmask=255.255.255.0 broadcast=10.0.0.255 gateway=10.0.0.254

Try to modify

  1. enable = 0 to enable = 1 in [pcap]` section.
  2. gateway=10.0.0.254 to gateway=10.0.0.25 in [port0] section.
chenxiang2019 commented 4 years ago

Thank you for your response. I have tried the methods you mentioned, but the problem still exists. Are there any examples that illustrate the testing of F-Stack? Thanks!

forestmo commented 3 years ago

Thank you for your response. I have tried the methods you mentioned, but the problem still exists. Are there any examples that illustrate the testing of F-Stack? Thanks!

这个问题解决了吗?我也遇到同样的问题了