F-Stack / f-stack

F-Stack is an user space network development kit with high performance based on DPDK, FreeBSD TCP/IP stack and coroutine API.
http://www.f-stack.org
Other
3.87k stars 899 forks source link

F-stack's performance is worse than regular posix API #758

Open NDani23 opened 1 year ago

NDani23 commented 1 year ago

Hi F-stack team!

I'm experimenting with f-stack for a few weeks now. I wrote a simple program that uses F-stack to generate as much traffic as possible. I wanted to compare f-stack's performance with the reguler posix socket API, so i wrote a kinda identical server-client program that uses regular posix sockets (with epoll).

When running both client and server with f-stack, ff_traffic in the server side shows that i receive data with 350-400 Mb/s . However, with the regular posix API, i measured around 450-500 Mb/s (I used bmon for measuring).

Could you please help me find out what am I doing wrong? I'm a beginner in this field so I would be grateful for every advice.

I used 2 oracle ubuntu VM-s for testing with Intel PRO/1000 MT Desktop (82540EM) adapter. Because of this, i could only run both F-stack client and server with only 1 core.

Client side code: https://github.com/NDani23/Tgen/blob/main/client.c Server side code: https://github.com/NDani23/Tgen/blob/main/main.c

Config.ini:

[dpdk]
# Hexadecimal bitmask of cores to run on.
lcore_mask=1

# Number of memory channels.
channel=4

# Specify base virtual address to map.
#base_virtaddr=0x7f0000000000

# Promiscuous mode of nic, defualt: enabled.
promiscuous=1
numa_on=1

# TX checksum offload skip, default: disabled.
# We need this switch enabled in the following cases:
# -> The application want to enforce wrong checksum for testing purposes
# -> Some cards advertize the offload capability. However, doesn't calculate checksum.
tx_csum_offoad_skip=0

# TCP segment offload, default: disabled.
tso=0

# HW vlan strip, default: enabled.
vlan_strip=0

# sleep when no pkts incomming
# unit: microseconds
idle_sleep=0

# sent packet delay time(0-100) while send less than 32 pkts.
# default 100 us.
# if set 0, means send pkts immediately.
# if set >100, will dealy 100 us.
# unit: microseconds
pkt_tx_delay=0

# use symmetric Receive-side Scaling(RSS) key, default: disabled.
symmetric_rss=0

# PCI device enable list.
# And driver options
#allow=02:00.0
# for multiple PCI devices
#allow=02:00.0,03:00.0

# enabled port list
#
# EBNF grammar:
#
#    exp      ::= num_list {"," num_list}
#    num_list ::= <num> | <range>
#    range    ::= <num>"-"<num>
#    num      ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
#
# examples
#    0-3       ports 0, 1,2,3 are enabled
#    1-3,4,7   ports 1,2,3,4,7 are enabled
#
# If use bonding, shoule config the bonding port id in port_list
# and not config slave port id in port_list
# such as, port 0 and port 1 trank to a bonding port 2,
# should set `port_list=2` and config `[port2]` section

port_list=0

# Number of vdev.
nb_vdev=0

# Number of bond.
nb_bond=0

# log level for dpdk, optional
# log_level=0

# Each core write into own pcap file, which is open one time, close one time if enough.
# Support dump the first snaplen bytes of each packet.
# if pcap file is lager than savelen bytes, it will be closed and next file was dumped into.
[pcap]
enable=0
isnaplen=96
savelen=16777216
savepath=/home/daniel

# Port config section
# Correspond to dpdk.port_list's index: port0, port1...
[port0]
addr=192.168.0.194
netmask=255.255.255.0
broadcast=192.168.0.255
gateway=192.168.0.1

#[port1]
#addr=192.168.0.154
#netmask=255.255.255.0
#broadcast=192.168.0.255
#gateway=192.168.0.1
# set interface name, Optional parameter.
#if_name=eno7

# IPv6 net addr, Optional parameters.
#addr6=ff::02
#prefix_len=64
#gateway6=ff::01

# Multi virtual IPv4/IPv6 net addr, Optional parameters.
#   `vip_ifname`: default `f-stack-x`
#   `vip_addr`: Separated by semicolons, MAX number 64;
#           Only support netmask 255.255.255.255, broadcast x.x.x.255 now, hard code in `ff_veth_setvaddr`.
#   `vip_addr6`: Separated by semicolons, MAX number 64.
#   `vip_prefix_len`: All addr6 use the same prefix now, default 64.
#vip_ifname=lo0
#vip_addr=192.168.1.3;192.168.1.4;192.168.1.5;192.168.1.6
#vip_addr6=ff::03;ff::04;ff::05;ff::06;ff::07
#vip_prefix_len=64

# lcore list used to handle this port
# the format is same as port_list
#lcore_list=0

# bonding slave port list used to handle this port
# need to config while this port is a bonding port
# the format is same as port_list
#slave_port_list=0,1

# Vdev config section
# orrespond to dpdk.nb_vdev's index: vdev0, vdev1...
#    iface : Shouldn't set always.
#    path : The vuser device path in container. Required.
#    queues : The max queues of vuser. Optional, default 1, greater or equal to the number of processes.
#    queue_size : Queue size.Optional, default 256.
#    mac : The mac address of vuser. Optional, default random, if vhost use phy NIC, it should be set to the phy NIC's mac.
#    cq : Optional, if queues = 1, default 0; if queues > 1 default 1.
#[vdev0]
##iface=/usr/local/var/run/openvswitch/vhost-user0
#path=/var/run/openvswitch/vhost-user0
#queues=1
#queue_size=256
#mac=00:00:00:00:00:01
#cq=0

# bond config section
# See http://doc.dpdk.org/guides/prog_guide/link_bonding_poll_mode_drv_lib.html
#[bond0]
#mode=4
#slave=0000:0a:00.0,slave=0000:0a:00.1
#primary=0000:0a:00.0
#mac=f0:98:38:xx:xx:xx
## opt argument
#socket_id=0
#xmit_policy=l23
#lsc_poll_period_ms=100
#up_delay=10
#down_delay=50

# Kni config: if enabled and method=reject,
# all packets that do not belong to the following tcp_port and udp_port
# will transmit to kernel; if method=accept, all packets that belong to
# the following tcp_port and udp_port will transmit to kernel.
[kni]
enable=0
method=reject
# The format is same as port_list
tcp_port=80,8080
udp_port=53

# FreeBSD network performance tuning configurations.
# Most native FreeBSD configurations are supported.
[freebsd.boot]
# If use rack/bbr which depend HPTS, you should set a greater value of hz, such as 1000000 means a tick is 1us.
hz=100

# Block out a range of descriptors to avoid overlap
# with the kernel's descriptor space.
# You can increase this value according to your app.
fd_reserve=1024

kern.ipc.maxsockets=262144

net.inet.tcp.syncache.hashsize=4096
net.inet.tcp.syncache.bucketlimit=100

net.inet.tcp.tcbhashsize=65536

kern.ncallout=262144

kern.features.inet6=1

[freebsd.sysctl]
kern.ipc.somaxconn=32768
kern.ipc.maxsockbuf=16777216

net.link.ether.inet.maxhold=5

net.inet.tcp.fast_finwait2_recycle=1
net.inet.tcp.sendspace=60000
net.inet.tcp.recvspace=84000
#net.inet.tcp.recvspace=80000
#net.inet.tcp.nolocaltimewait=1
net.inet.tcp.cc.algorithm=cubic
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
#net.inet.tcp.recvbuf_max=100000000
net.inet.tcp.sendbuf_auto=1
net.inet.tcp.recvbuf_auto=1
net.inet.tcp.sendbuf_inc=16384
#net.inet.tcp.recvbuf_inc=16384
net.inet.tcp.sack.enable=1
net.inet.tcp.blackhole=1
net.inet.tcp.msl=2000
net.inet.tcp.delayed_ack=1
net.inet.tcp.rfc1323=1

net.inet.udp.blackhole=1
net.inet.ip.redirect=0
net.inet.ip.forwarding=0

net.inet6.ip6.auto_linklocal=1
net.inet6.ip6.accept_rtadv=2
net.inet6.icmp6.rediraccept=1
net.inet6.ip6.forwarding=0

# set default stacks:freebsd, rack or bbr, may be you need increase the value of parameter 'freebsd.boot.hz' while use rack or bbr.
net.inet.tcp.functions_default=freebsd
# need by bbr, should enable it.
net.inet.tcp.hpts.skip_swi=1
# Interval between calls to hpts_timeout_dir. default min 250us, max 256-512ms, default 512ms.
net.inet.tcp.hpts.minsleep=250
# [25600-51200]
net.inet.tcp.hpts.maxsleep=51200

I tought maybe something with HugePage allocation could went wrong

grep Huge /proc/memimfo

AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:     236
HugePages_Free:      214
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:          483328 kB

Sorry for the long post.

Keep up the good work!

CodingLappen commented 1 year ago

Well the NIC is virtualized in the kernel. Of course it has the same or equal throughput.

jfb8856606 commented 1 year ago

Because your client BUF_SIZE is 16384, may be you can try adjustment pkt_tx_delay=0 to pkt_tx_delay=1. And because test with one connection, so you need modify net.inet.tcp.delayed_ack=1 to net.inet.tcp.delayed_ack=0. And retry test it.

pkt_tx_delay and net.inet.tcp.delayed_ack need set to different values in different test scenarios to achieve better performance.

jfb8856606 commented 1 year ago

And you can try adjustment tso=0 in config.ini to test it.