awesometic / realtek-r8125-dkms

A DKMS package for easy use of Realtek r8125 driver, which supports 2.5 GbE.
GNU General Public License v2.0
360 stars 68 forks source link

Throughput monitoring issue #33

Closed mschirrmeister closed 1 year ago

mschirrmeister commented 1 year ago

Hello,

I am running the latest version 9.011.00-NAPI with kernel 6.1 and it shows wrong values for the throughput. The nic is connected to a 1GBit switch. With the kernels default driver r8169 the throughput monitoring tools show typically around 115MB/s. With the r8125 driver, it shows multiple hundred Gigabyte/s. It changes between 300-700 GB/s.

Driver

root@nightowl ~# ethtool -i ens4
driver: r8125
version: 9.011.00-NAPI
firmware-version:
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

pci device

root@nightowl ~# lspci -s 01:00.0 -k
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
    Subsystem: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller
    Kernel driver in use: r8125
    Kernel modules: r8169, r8125

Example wrong value.

  bwm-ng v0.6.3 (probing every 0.500s), press 'h' for help
  input: /proc/net/dev type: rate
  -         iface                   Rx                   Tx                Total
  ==============================================================================
               lo:           0.00  B/s            0.00  B/s            0.00  B/s
             ens4:         635.39 GB/s          546.15 KB/s          635.39 GB/s
             ens5:           0.00  B/s            0.00  B/s            0.00  B/s
  ------------------------------------------------------------------------------
            total:         635.39 GB/s          546.15 KB/s          635.39 GB/s

Any idea if I am doing something wrong, or is this a known issue?

awesometic commented 1 year ago

Hello,

Can you try this Debian package? I reverted some changes that came from this 9.011.00 version.

Please remove the .zip extension from the attached file, Github restricts uploading files.

realtek-r8125-dkms_9.011.00-2_amd64.deb.zip

mschirrmeister commented 1 year ago

The package does not install. Error is below.

DKMS make.log for realtek-r8125-9.011.00 for kernel 6.1.0-7-amd64 (amd64)
Wed Apr 12 08:50:18 AM CEST 2023
/bin/sh: 1: VER: not found
make -C src/ KVER=6.1.0-7-amd64 BASEDIR=/lib/modules/6.1.0-7-amd64 modules
make[1]: Entering directory '/var/lib/dkms/realtek-r8125/9.011.00/build/src'
make -C /lib/modules/6.1.0-7-amd64/build M=/var/lib/dkms/realtek-r8125/9.011.00/build/src modules
make[2]: Entering directory '/usr/src/linux-headers-6.1.0-7-amd64'
  CC [M]  /var/lib/dkms/realtek-r8125/9.011.00/build/src/r8125_n.o
  CC [M]  /var/lib/dkms/realtek-r8125/9.011.00/build/src/rtl_eeprom.o
  CC [M]  /var/lib/dkms/realtek-r8125/9.011.00/build/src/rtltool.o
/var/lib/dkms/realtek-r8125/9.011.00/build/src/r8125_n.c:13512:31: error: ‘rtl8125_get_stats’ undeclared here (not in a function); did you mean ‘rtl8125_get_stats64’?
13512 |         .ndo_get_stats      = rtl8125_get_stats,
      |                               ^~~~~~~~~~~~~~~~~
      |                               rtl8125_get_stats64
/var/lib/dkms/realtek-r8125/9.011.00/build/src/r8125_n.c:13468:1: warning: ‘rtl8125_get_stats64’ defined but not used [-Wunused-function]
13468 | rtl8125_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
      | ^~~~~~~~~~~~~~~~~~~
make[3]: *** [/usr/src/linux-headers-6.1.0-7-common/scripts/Makefile.build:255: /var/lib/dkms/realtek-r8125/9.011.00/build/src/r8125_n.o] Error 1
make[2]: *** [/usr/src/linux-headers-6.1.0-7-common/Makefile:2037: /var/lib/dkms/realtek-r8125/9.011.00/build/src] Error 2
make[2]: Leaving directory '/usr/src/linux-headers-6.1.0-7-amd64'
make[1]: *** [Makefile:188: modules] Error 2
make[1]: Leaving directory '/var/lib/dkms/realtek-r8125/9.011.00/build/src'
make: *** [Makefile:42: modules] Error 2

I will be unavailable for the next 3 weeks. Can do the next test most likely first at the beginning of May.

awesometic commented 1 year ago

Looks like the compiler options caused that.

Here is the new file: realtek-r8125-dkms_9.011.00-2_amd64.deb.zip

I checked it compiles normally. Sorry for the inconvenience 😅

mschirrmeister commented 1 year ago

Thanks. That one installs fine. But the problem is still there. It shows still GB/s. The number itself might be a little better. But still goes to high and fluctuates more compared to the r8169 driver.

awesometic commented 1 year ago

Then we should check if it happens on the other kernel versions too.

I neutralize some conditions about kernel version 5.11.0 or above in the network stat things, which are not there in the previous version. Maybe there is another point I should look at but anyway Realtek should know this error and release the new version if this error is also caused on another system.

mschirrmeister commented 1 year ago

Looks like it is to some extend kernel depended. I tested the following 2 kernels. Both have the problem as well.

On 5.15.94-x86 it looks worse. Numbers go again up to 400GB/s.

I thought about reporting it to Realtek too, but did not find any good way to report it yet. Only thing I found is a support email address for network cards. I might drop a mail there and lets hope they can fix it.

awesometic commented 1 year ago

Thank you for the test and for reporting it to Realtek. Let's wait for the new version.

dream10201 commented 1 year ago

The same problem and after running for a while, dmesg gives these errors:

[  993.729605] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G    B      OE      6.2.10-arch1-1 #1 3b64a9154b84a23b8badf9e10678249884a952c6
[  993.729609] Hardware name: Default string Default string/Default string, BIOS 1.010 09/27/2021
[  993.729611] ==================================================================
[ 1002.885181] ==================================================================
[ 1002.885190] BUG: KFENCE: use-after-free read in rtl8125_rx_interrupt+0x347/0x5c0 [r8125]

[ 1002.885207] Use-after-free read at 0x0000000024c7079d (in kfence-#222):
[ 1002.885210]  rtl8125_rx_interrupt+0x347/0x5c0 [r8125]
[ 1002.885221]  rtl8125_poll_msix_rx+0x45/0x90 [r8125]
[ 1002.885231]  __napi_poll+0x28/0x1b0
[ 1002.885238]  net_rx_action+0x2a2/0x360
[ 1002.885241]  __do_softirq+0xd1/0x2c8
[ 1002.885245]  __irq_exit_rcu+0xb7/0xe0
[ 1002.885250]  common_interrupt+0x86/0xa0
[ 1002.885252]  asm_common_interrupt+0x26/0x40
[ 1002.885257]  cpuidle_enter_state+0xe2/0x420
[ 1002.885261]  cpuidle_enter+0x2d/0x40
[ 1002.885263]  do_idle+0x1ed/0x270
[ 1002.885266]  cpu_startup_entry+0x1d/0x20
[ 1002.885269]  rest_init+0xc8/0xd0
[ 1002.885272]  arch_call_rest_init+0xe/0x30
[ 1002.885277]  start_kernel+0x734/0xb30
[ 1002.885280]  secondary_startup_64_no_verify+0xe5/0xeb

[ 1002.885286] kfence-#222: 0x000000001096ce9d-0x00000000efff5d14, size=232, cache=skbuff_head_cache

[ 1002.885289] allocated by task 412 on cpu 2 at 1002.503026s:
[ 1002.885295]  __alloc_skb+0x167/0x1d0
[ 1002.885299]  alloc_skb_with_frags+0x50/0x200
[ 1002.885301]  sock_alloc_send_pskb+0x203/0x250
[ 1002.885304]  __ip_append_data+0x998/0x1070
[ 1002.885308]  ip_make_skb+0x105/0x140
[ 1002.885310]  udp_sendmsg+0xacf/0xe90
[ 1002.885314]  udpv6_sendmsg+0x469/0x1050
[ 1002.885317]  sock_sendmsg+0x46/0x70
[ 1002.885319]  ____sys_sendmsg+0x17f/0x2f0
[ 1002.885321]  ___sys_sendmsg+0x9a/0xe0
[ 1002.885323]  __sys_sendmmsg+0xe3/0x210
[ 1002.885326]  __x64_sys_sendmmsg+0x21/0x30
[ 1002.885329]  do_syscall_64+0x5c/0x90
[ 1002.885332]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

[ 1002.885336] freed by task 0 on cpu 0 at 1002.885162s:
[ 1002.885372]  tcp_data_queue+0x5a6/0xec0
[ 1002.885375]  tcp_rcv_established+0x210/0x730
[ 1002.885378]  tcp_v6_do_rcv+0xde/0x4c0
[ 1002.885380]  tcp_v6_rcv+0xc88/0xd00
[ 1002.885383]  ip6_protocol_deliver_rcu+0x6c/0x480
[ 1002.885385]  ip6_input_finish+0x43/0x60
[ 1002.885386]  ip6_sublist_rcv_finish+0x59/0x90
[ 1002.885388]  ip6_sublist_rcv+0x22f/0x2f0
[ 1002.885390]  ipv6_list_rcv+0x13f/0x170
[ 1002.885392]  __netif_receive_skb_list_core+0x1f6/0x2c0
[ 1002.885395]  netif_receive_skb_list_internal+0x1d1/0x310
[ 1002.885398]  napi_gro_receive+0xd0/0x210
[ 1002.885400]  rtl8125_rx_interrupt+0x33d/0x5c0 [r8125]
[ 1002.885410]  rtl8125_poll_msix_rx+0x45/0x90 [r8125]
[ 1002.885420]  __napi_poll+0x28/0x1b0
[ 1002.885423]  net_rx_action+0x2a2/0x360
[ 1002.885426]  __do_softirq+0xd1/0x2c8
[ 1002.885428]  __irq_exit_rcu+0xb7/0xe0
[ 1002.885430]  common_interrupt+0x86/0xa0
[ 1002.885432]  asm_common_interrupt+0x26/0x40
[ 1002.885435]  cpuidle_enter_state+0xe2/0x420
[ 1002.885438]  cpuidle_enter+0x2d/0x40
[ 1002.885440]  do_idle+0x1ed/0x270
[ 1002.885442]  cpu_startup_entry+0x1d/0x20
[ 1002.885444]  rest_init+0xc8/0xd0
[ 1002.885447]  arch_call_rest_init+0xe/0x30
[ 1002.885450]  start_kernel+0x734/0xb30
[ 1002.885452]  secondary_startup_64_no_verify+0xe5/0xeb
dream10201 commented 1 year ago

@mschirrmeister The official test version given by realtek, maybe you can try it, I'm not near the device and can't test it. 图片 r8125-9.011.01_20230412_b1.zip

dream10201 commented 1 year ago

@awesometic After a period of testing, the problem did not reproduce.

awesometic commented 1 year ago

@dream10201

Thank you for your effort,

Can we merge that beta version into our repository? It will be open-sourced anyway, but don't know if we can use the unpublished version 🤔

dream10201 commented 1 year ago

@awesometic Create a patch file and patch it before compiling. Maybe it would be better?

mschirrmeister commented 1 year ago

What @dream10201 posted here is also what Linda sent me for my question to Realtek. She mentioned that I can share the version here, because I am right now on vacation until early May. But @dream10201 shared the driver already. :-)

Linda also mentioned to me that they will apply the change in their next driver releases as well.

awesometic commented 1 year ago

Fixed it by 9.011.01 version :)