bondagit / ravenna-alsa-lkm

RAVENNA ALSA LKM
20 stars 9 forks source link

PTP locking problem on kernel 6.8.2 (Arch LInux) #18

Closed cedonik closed 6 months ago

cedonik commented 7 months ago

Hi,

With latest aes67-daemon branch 416b068 ptp clock is not locking on kernel 6.8.2 on Arch Linux.

On LTS kernel 6.6.23 (same machine) ptp locking works.

Network card is Intel 219L (hardware ptp), master clock is on "KH750 AES67" and also with linuxptp on i350. Tried on another system (freshly installed) with i219LM, with same result.

Thanks!

bondagit commented 7 months ago

I have just tried on a x86_64 N40 mini PC and it works for me with a kernel 6.8.3. I am using a Dante AVIOUBS device as master PTP clock

cedonik commented 7 months ago

Which OS do you use? I've just installed ubuntu-mate 24.04 (daily build) with kernel 6.8.0-22-generic and again no locking. Previous ubuntu 22.04 with kernel 6.5 works as expected.

Also run_test.sh and run_latency_test.sh are not working, could not lock on "lo" also i guess

bondagit commented 7 months ago

Ok, I see. I installed Ubuntu 22.04 LTS and then I upgraded the kernel using the instruction at: https://askubuntu.com/questions/1388115/how-do-i-update-my-kernel-to-the-latest-one

To summarise:

  1. get the ubuntu-mainline-kernel.sh script:
    wget https://raw.githubusercontent.com/pimlie/ubuntu-mainline-kernel.sh/master/ubuntu-mainline-kernel.sh
    sudo install ubuntu-mainline-kernel.sh /usr/local/bin/
  2. list the currently installed versions: ubuntu-mainline-kernel.sh -l
  3. list the available versions: ubuntu-mainline-kernel.sh -r
  4. install a version: sudo ubuntu-mainline-kernel.sh -i v6.8.3.
  5. reboot
  6. recompile the kernel module and retry
cedonik commented 7 months ago

Could you double check which kernel you are using?

Although it does not show an error, ubuntu-mainline-kernel.sh could not install cleanly linux-headers-6.8.3-060803_6.8.3-060803.202404031037_all.deb on ubuntu 22.04 because of missing dependencies:

dpkg: dependency problems prevent configuration of linux-headers-6.8.3-060803-generic: linux-headers-6.8.3-060803-generic depends on libc6 (>= 2.38); however: Version of libc6:amd64 on system is 2.35-0ubuntu3.6. linux-headers-6.8.3-060803-generic depends on libelf1t64 (>= 0.144); however: Package libelf1t64 is not installed. linux-headers-6.8.3-060803-generic depends on libssl3t64 (>= 3.0.0); however: Package libssl3t64 is not installed.

Also kernel 6.8.3 is compiled with gcc-13, which is not available on official repositories for ubuntu 22.04, so driver compiling fails with:

root@julija-HP-Laptop-15-da2xxx:~/aes67-linux-daemon/3rdparty/ravenna-alsa-lkm/driver# make make -C /lib/modules/6.8.3-060803-generic/build/ M=/root/aes67-linux-daemon/3rdparty/ravenna-alsa-lkm/driver modules make[1]: Entering directory '/usr/src/linux-headers-6.8.3-060803-generic' warning: the compiler differs from the one used to build the kernel The kernel was built by: x86_64-linux-gnu-gcc-13 (Ubuntu 13.2.0-21ubuntu1) 13.2.0 You are using:
CC [M] /root/aes67-linux-daemon/3rdparty/ravenna-alsa-lkm/driver/c_wrapper_lib.o /bin/sh: 1: gcc-13: not found make[3]: [scripts/Makefile.build:243: /root/aes67-linux-daemon/3rdparty/ravenna-alsa-lkm/driver/c_wrapper_lib.o] Error 127 make[2]: [/usr/src/linux-headers-6.8.3-060803-generic/Makefile:1927: /root/aes67-linux-daemon/3rdparty/ravenna-alsa-lkm/driver] Error 2 make[1]: [Makefile:240: __sub-make] Error 2 make[1]: Leaving directory '/usr/src/linux-headers-6.8.3-060803-generic' make: [Makefile:15: modules] Error 2

bondagit commented 7 months ago

I could install that kernel version, but yes I also have some troubles related to dependencies. Try to run a package upgrade first. For installing gcc-13 I used instructions at: https://askubuntu.com/questions/1490387/how-do-i-install-the-gcc-13-aarch64-cross-compiler-on-ubuntu-22-04

bondagit commented 7 months ago

You should also be able to start from a more recent Ubuntu release and try with kernel v6.8.3 or v6.8.4

cedonik commented 7 months ago

Try to run a package upgrade first.

Packages are full up to date from official repos, 6.8 kernel is not intended for ubuntu 22.04

For installing gcc-13 I used instructions at:

This is for ubuntu 23.04 and 23.10

I have also tried odroid-c2 (aarch64) on arch linux with mainline kernel 6.8.2 without locking as well.

I'll try with Ubuntu 23.10 now and 6.8.4 kernel now, but I doubt it will change anything regarding arch linux which have kernel with minimum patches... also like i've said latest ubuntu does not work as well on 6.8 kernel

cedonik commented 7 months ago

Locking on Ubuntu 23.10 with kernel 6.5.0-26-generic works.

Upgrade to 6.8.4 also fails to finish up on linux-headers-6.8.4-060804-generic_6.8.4-060804.202404041833_amd64.deb

dpkg: dependency problems prevent configuration of linux-headers-6.8.4-060804-generic: linux-headers-6.8.4-060804-generic depends on libelf1t64 (>= 0.144); however: Package libelf1t64 is not installed. linux-headers-6.8.4-060804-generic depends on libssl3t64 (>= 3.0.0); however: Package libssl3t64 is not installed. dpkg: error processing package linux-headers-6.8.4-060804-generic (--install): dependency problems - leaving unconfigured Errors were encountered while processing: linux-headers-6.8.4-060804-generic

Package is unpacked (not cleanly installed), so headers are in place for kernel compile regardless of missing dependencies... Ubuntu 23.10 is using gcc-13, now module is compiled and locking finally working on 6.8.4 and ubuntu 23.10

So, the current situation is:

I will be happy to help if you need some further testing.

Thanks.

bondagit commented 7 months ago

On the distro/kernel combinations where the daemon doesn't lock I think it would be interesting to verify if the PTP master clock packets reach the host. For example with: sudo tcpdump -vv -i eth0 dst 224.0.1.129

cedonik commented 7 months ago

PTP sync is perfectly stable on all hardware/software combinations (including odroid-c2) with linuxptp as client in hardware and software mode

ptp4l -i enp0s31f6 -m -q -l7 -s ...... ptp4l[3845.406]: port 1 (enp0s31f6): setting asCapable ptp4l[3845.406]: port 1 (enp0s31f6): new foreign master 283638.fffe.60b9fd-1 --- this is MAC from KH750 ..... ptp4l[4046.255]: master offset -25351 s2 freq -1628169 path delay 119935 ptp4l[4046.792]: port 1 (enp0s31f6): delay timeout ptp4l[4046.794]: delay filtered 121610 raw 121032 ptp4l[4046.947]: port 1 (enp0s31f6): delay timeout ptp4l[4046.949]: delay filtered 119357 raw 111867 ptp4l[4047.158]: master offset -32693 s2 freq -1643116 path delay 119357 ptp4l[4047.989]: port 1 (enp0s31f6): delay timeout ptp4l[4047.991]: delay filtered 119357 raw 92123 ptp4l[4048.062]: master offset 32491 s2 freq -1587740 path delay 119357 ptp4l[4048.965]: master offset 44394 s2 freq -1566090 path delay 119357

Also test scripts run_test.sh and run_latency_test.sh which work on loopback interface can not lock, so no switch/firewall/something is interfering with synchronization.

cedonik commented 7 months ago

Sorry I forgot tcpdump logs:

[root@cedo-lenovo-arch ~]# tcpdump -vv -i enp0s31f6 dst 224.0.1.129 tcpdump: listening on enp0s31f6, link-type EN10MB (Ethernet), snapshot length 262144 bytes 19:07:03.235708 IP (tos 0xb8, ttl 32, id 18662, offset 0, flags [DF], proto UDP (17), length 92) 192.168.30.100.ptp-general > ptp-primary.mcast.net.ptp-general: [udp sum ok] PTPv2, v1 compat : no, msg type : announce msg, length : 64, domain : 0, reserved1 : 0, Flags [timescale], NS correction : 0, sub NS correction : 0, reserved2 : 0, clock identity : 0x283638fffe60b9fd, port id : 1, seq id : 3391, control : 5 (Other), log message interval : 1, originTimeStamp : 0 seconds 0 nanoseconds, origin cur utc :37, rsvd : 0, gm priority_1 : 128, gm clock class : 248, gm clock accuracy : 33, gm clock variance : 17258, gm priority_2 : 128, gm clock id : 0x283638fffe60b9fd, steps removed : 0, time source : 0xa0 19:07:03.624501 IP (tos 0xb8, ttl 32, id 18681, offset 0, flags [DF], proto UDP (17), length 72) 192.168.30.100.ptp-event > ptp-primary.mcast.net.ptp-event: [udp sum ok] PTPv2, v1 compat : no, msg type : sync msg, length : 44, domain : 0, reserved1 : 0, Flags [two step], NS correction : 0, sub NS correction : 0, reserved2 : 0, clock identity : 0x283638fffe60b9fd, port id : 1, seq id : 6766, control : 0 (Sync), log message interval : 0, originTimeStamp : 0 seconds, 0 nanoseconds 19:07:03.625573 IP (tos 0xb8, ttl 32, id 18682, offset 0, flags [DF], proto UDP (17), length 72) 192.168.30.100.ptp-general > ptp-primary.mcast.net.ptp-general: [udp sum ok] PTPv2, v1 compat : no, msg type : follow up msg, length : 44, domain : 0, reserved1 : 0, Flags [none], NS correction : 0, sub NS correction : 0, reserved2 : 0, clock identity : 0x283638fffe60b9fd, port id : 1, seq id : 6766, control : 2 (Follow_Up), log message interval : 0, preciseOriginTimeStamp : 1167616520 seconds, 944848448 nanoseconds 19:07:04.528372 IP (tos 0xb8, ttl 32, id 18868, offset 0, flags [DF], proto UDP (17), length 72) 192.168.30.100.ptp-event > ptp-primary.mcast.net.ptp-event: [udp sum ok] PTPv2, v1 compat : no, msg type : sync msg, length : 44, domain : 0, reserved1 : 0, Flags [two step], NS correction : 0, sub NS correction : 0, reserved2 : 0, clock identity : 0x283638fffe60b9fd, port id : 1, seq id : 6767, control : 0 (Sync), log message interval : 0, originTimeStamp : 0 seconds, 0 nanoseconds 19:07:04.529612 IP (tos 0xb8, ttl 32, id 18869, offset 0, flags [DF], proto UDP (17), length 72) 192.168.30.100.ptp-general > ptp-primary.mcast.net.ptp-general: [udp sum ok] PTPv2, v1 compat : no, msg type : follow up msg, length : 44, domain : 0, reserved1 : 0, Flags [none], NS correction : 0, sub NS correction : 0, reserved2 : 0, clock identity : 0x283638fffe60b9fd, port id : 1, seq id : 6767, control : 2 (Follow_Up), log message interval : 0, preciseOriginTimeStamp : 1167616521 seconds, 948867194 nanoseconds 19:07:05.039262 IP (tos 0xb8, ttl 32, id 18980, offset 0, flags [DF], proto UDP (17), length 92) 192.168.30.100.ptp-general > ptp-primary.mcast.net.ptp-general: [udp sum ok] PTPv2, v1 compat : no, msg type : announce msg, length : 64, domain : 0, reserved1 : 0, Flags [timescale], NS correction : 0, sub NS correction : 0, reserved2 : 0, clock identity : 0x283638fffe60b9fd, port id : 1, seq id : 3392, control : 5 (Other), log message interval : 1, originTimeStamp : 0 seconds 0 nanoseconds, origin cur utc :37, rsvd : 0, gm priority_1 : 128, gm clock class : 248, gm clock accuracy : 33, gm clock variance : 17258, gm priority_2 : 128, gm clock id : 0x283638fffe60b9fd, steps removed : 0, time source : 0xa0 19:07:05.431755 IP (tos 0xb8, ttl 32, id 18992, offset 0, flags [DF], proto UDP (17), length 72) 192.168.30.100.ptp-event > ptp-primary.mcast.net.ptp-event: [udp sum ok] PTPv2, v1 compat : no, msg type : sync msg, length : 44, domain : 0, reserved1 : 0, Flags [two step], NS correction : 0, sub NS correction : 0, reserved2 : 0, clock identity : 0x283638fffe60b9fd, port id : 1, seq id : 6768, control : 0 (Sync), log message interval : 0, originTimeStamp : 0 seconds, 0 nanoseconds

zophos commented 6 months ago

I have the same problem on Ubuntu 24.04 (kernel 6.8.0). The driver is unable to lock the ptp clock, whether using dante AVIO or ptp4l as the master clock. On Ubuntu 22.04 (kernel 6.5.25), with the same network/hardware configuration as Ubuntu 24.04, the driver works fine.

zophos commented 6 months ago

I found the following statement in the Kernel 6.8 release announcement. This may be related to this problem.

      net/mlx5e: Use a memory barrier to enforce PTP WQ xmit
submission tracking occurs after populating the metadata_map
      net/mlx5e: Switch to using _bh variant of of spinlock API in
port timestamping NAPI poll context

https://lore.kernel.org/lkml/CAHk-=wiehc0DfPtL6fC2=bFuyzkTnuiuYSQrr6JTQxQao6pq1Q@mail.gmail.com/

bondagit commented 6 months ago

I just fixed a couple of kernel Undefined Behavior Sanitizer and added PTP messages debug prints. Can you please retry your tests using the module in the branch aes67-daemon-issue-18 and report back the kernel ring buffer withsudo dmesg ?

zophos commented 6 months ago

Below is dmesg just after insmod on Ubuntu 24.04 (kernel 6.8.0-31.generic).

  50.160106] MergingRavennaALSA: loading out-of-tree module taints kernel.
[   50.160111] MergingRavennaALSA: module verification failed: signature and/or required key missing - tainting kernel
[   50.168818] mr_alsa_audio_preallocate_memory: allocated playback buffer of 100663296 bytes vmalloc requested
[   50.180910] mr_alsa_audio_preallocate_memory: allocated capture buffer of 100663296 bytes vmalloc requested
[   50.186867] mr_alsa_audio_pcm_open: get_master_switch_value error
[   50.186868] Register ALSA driver into Ravenna Peer...
[   50.186997] snd_merging_rav snd_merging_rav.0: mr_alsa_audio snd_card_register successful
[   50.187006] Base period set to 1333333 ns
[   50.187006] Merging RAVENNA ALSA module installed
[   50.198940] entering mr_alsa_audio_pcm_open (substream name=subdevice #0 #0) ...
[   50.198943] mr_alsa_audio_pcm_open: capture format nb bits range: [16, 32]
[   50.198944] mr_alsa_audio_pcm_open: capture period size range: [1024, 524288], periods range: [2, 48]
[   50.198945] Current PTPFrame Size = 512, minPTPFrameSize = 512, maxPTPFrameSize = 1024
[   50.198991] mr_alsa_audio_pcm_hw_params (enter): rate=48000 format=2 channels=2 period_size=512, nb_periods=9
               , buffer_bytes=18432
[   50.198993] mr_alsa_audio_pcm_hw_params (capture): wrong nbPeriods (9 instead of 96)...
[   50.199000] mr_alsa_audio_pcm_hw_params done: rate=48000 format=2 channels=2 period_size=512, nb_periods=9
               , buffer_bytes=18432
[   50.199007] entering mr_alsa_audio_pcm_prepare (substream name=subdevice #0 #0) ...
   50.199008] mr_alsa_audio_pcm_prepare: rate=48000 format=2 channels=2 period_size=512, nb periods=9
[   50.199010] mr_alsa_audio_pcm_prepare for capture stream
[   50.199010] mr_alsa_audio_pcm_prepare for capture stream failed
[   50.199018] entering mr_alsa_audio_pcm_hw_free (substream name=subdevice #0 #0) ...
[   50.199025] mr_alsa_audio_pcm_hw_params (enter): rate=48000 format=2 channels=2 period_size=512, nb_periods=9
               , buffer_bytes=1843

 :  #Repeat below
bondagit commented 6 months ago

did you start the daemon ? do you have a master clock on the network ?

zophos commented 6 months ago

I don't start aes67-daemon. There is Dante AVIO-USB as the master clock on my network.

bondagit commented 6 months ago

I don't start aes67-daemon. I have the master clock as Dante AVIO-USB.

you have to start the daemon to check for PTP master clock locking

zophos commented 6 months ago

I started aes67-daemon and got the following additional message: PTP has been still unlocked.

: (snip)

[   50.227686] entering mr_alsa_audio_pcm_close (substream name=subdevice #0 #0) ...
[   50.227707] entering mr_alsa_audio_pcm_open (substream name=subdevice #0 #0) ...
[   50.227708] mr_alsa_audio_pcm_open: capture format nb bits range: [16, 32]
[   50.227708] mr_alsa_audio_pcm_open: capture period size range: [1024, 524288], periods range: [2, 48]
[   50.227709] Current PTPFrame Size = 512, minPTPFrameSize = 512, maxPTPFrameSize = 1024
[   50.227733] entering mr_alsa_audio_pcm_close (substream name=subdevice #0 #0) ...
[   55.275188] nf_hook_func first message received
[ 1401.899944] Hello Mr RAV ALSA Daemon
[ 1401.900010] Base period set to 11609977 ns
[ 1401.900045] Base period set to 1088435 ns
[ 1401.900070] Base period set to 1088435 ns
[ 1432.588473] Bye Mr RAV ALSA Daemon
[ 1500.552466] Hello Mr RAV ALSA Daemon
[ 1500.552715] Base period set to 1088435 ns
[ 1500.552834] Base period set to 1088435 ns
[ 1500.552887] Base period set to 1088435 ns
bondagit commented 6 months ago

did you use the driver from the branch aes67-daemon-issue-18 ? if yes the driver is not receiving any PTP packet and we need to check the kernel configuration in /boot/config-$(uname -r). By using tcpdump on the same Linux host you can check if you are receiving the PTP master traffic, for example with: sudo tcpdump -vv -i eth0 dst 224.0.1.129

bondagit commented 6 months ago

I could finally reproduce the issue by installing Ubuntu 24.04 on a VM. The problem is related to a corrupted network interface name returned in the name field of the net_device structure used by the netfilter hook in the driver. The interface name is used to discard all packets not coming from the configured network interface. Since the input interface name is corrupted all the packets gets filtered, see EtherTubeNetfilter.c:270. The problem doesn't seem to be related to the driver code. Are you also using a VM or are you using a physical host ?

cedonik commented 6 months ago

If you really read what we post here, you will see that I've replied that linux-ptp software is perfectly working in client and server mode. So there is no network filtering in my case on all HARDWARE machines that i have tried.

I'll check new branch now and check for kernel config...

bondagit commented 6 months ago

If you really read what we post here, you will see that I've replied that linux-ptp software is perfectly working in client and server mode. So there is no network filtering in my case on all HARDWARE machines that i have tried.

I'll check new branch now and check for kernel config...

The MergingRavennaALSA is using Linux kernel netfilter to receive the PTP master packets, but for the issue described above these packets get discarded. This is the reason the PTP is not locking for the daemon. I also verified that the PTP packets are correctly received by the host so linux-ptp is working properly.

bondagit commented 6 months ago

Just disable the interface name check in the driver and it works as expected.

--- a/driver/EtherTubeNetfilter.c
+++ b/driver/EtherTubeNetfilter.c
@@ -270,7 +270,7 @@ int rx_packet(TEtherTubeNetfilter* self, void* packet, int packet_size, const ch
         if (strlen(ifname) != 0 && strcmp(ifname, self->ifname_used_) != 0)
         {
             //MTAL_DP_INFO("2: %s, %s\n", ifname, self->ifname_used_);
-            return 1;
+            //return 1;
         }
         if (packet == NULL)
         {
zophos commented 6 months ago

Yes, I'm using aes67-daemon-issue-18 branch driver.

Here are results of grep PTP /boot/config-6.8.0-31-generic. Pls forgive the mixing of unrelated lines.

CONFIG_OPTPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_NET_PTP_CLASSIFY=y
CONFIG_NF_CONNTRACK_PPTP=m
CONFIG_NF_NAT_PPTP=m
CONFIG_NET_DSA_MICROCHIP_KSZ_PTP=y
CONFIG_NET_DSA_MV88E6XXX_PTP=y
CONFIG_NET_DSA_SJA1105_PTP=y
CONFIG_CAVIUM_PTP=m
CONFIG_BCM_NET_PHYPTP=m
CONFIG_PPTP=m
# PTP clock support
CONFIG_PTP_1588_CLOCK=y
CONFIG_PTP_1588_CLOCK_OPTIONAL=y
CONFIG_PTP_1588_CLOCK_INES=m
CONFIG_PTP_1588_CLOCK_KVM=m
CONFIG_PTP_1588_CLOCK_IDT82P33=m
CONFIG_PTP_1588_CLOCK_IDTCM=m
CONFIG_PTP_1588_CLOCK_MOCK=m
CONFIG_PTP_1588_CLOCK_VMW=m
CONFIG_PTP_1588_CLOCK_OCP=m
CONFIG_PTP_DFL_TOD=m
# end of PTP clock support

The following is the result of tcpdump. 192.168.1.3 is the PTP master clock (Dante AVIO-USB). aes67-daemon is running under this machine. (but still unlocking).

23:17:42.968101 IP (tos 0xb8, ttl 16, id 453, offset 0, flags [none], proto UDP (17), length 72)
    192.168.1.3.49158 > ptp-primary.mcast.net.ptp-event: [udp sum ok] PTPv2, v1 compat : no, msg type : sync msg, length : 44, domain : 0, reserved1 : 0, Flags [two step], NS correction : 0, sub NS correction : 0, reserved2 : 0, clock identity : 0x1dc1fffe5450fc, port id : 3, seq id : 57178, control : 0 (Sync), log message interval : 254, originTimeStamp : 358371 seconds, 265202700 nanoseconds
23:17:42.968118 IP (tos 0xb8, ttl 16, id 454, offset 0, flags [none], proto UDP (17), length 72)
    192.168.1.3.49159 > ptp-primary.mcast.net.ptp-general: [udp sum ok] PTPv2, v1 compat : no, msg type : follow up msg, length : 44, domain : 0, reserved1 : 0, Flags [none], NS correction : 0, sub NS correction : 0, reserved2 : 0, clock identity : 0x1dc1fffe5450fc, port id : 3, seq id : 57178, control : 2 (Follow_Up), log message interval : 254, preciseOriginTimeStamp : 358371 seconds, 265259922 nanoseconds
23:17:43.007116 IP (tos 0xe0, ttl 1, id 455, offset 0, flags [none], proto UDP (17), length 152)
    192.168.1.3.49156 > ptp-primary.mcast.net.ptp-event: [udp sum ok] PTPv1 (not implemented)
23:17:43.007165 IP (tos 0xb8, ttl 1, id 456, offset 0, flags [none], proto UDP (17), length 80)
    192.168.1.3.49157 > ptp-primary.mcast.net.ptp-general: [udp sum ok] PTPv1 (not implemented)
23:17:43.218103 IP (tos 0xb8, ttl 16, id 457, offset 0, flags [none], proto UDP (17), length 72)

Is there any other information needed?

bondagit commented 6 months ago

Yes, I'm using aes67-daemon-issue-18 branch driver.

just try to modify the driver as reported below and the problem should be fixed. This worked for me.


--- a/driver/EtherTubeNetfilter.c
+++ b/driver/EtherTubeNetfilter.c
@@ -270,7 +270,7 @@ int rx_packet(TEtherTubeNetfilter* self, void* packet, int packet_size, const ch
         if (strlen(ifname) != 0 && strcmp(ifname, self->ifname_used_) != 0)
         {
             //MTAL_DP_INFO("2: %s, %s\n", ifname, self->ifname_used_);
-            return 1;
+            //return 1;
         }
         if (packet == NULL)
         {
zophos commented 6 months ago

Ah, we just missed each other :) Tnx. I'm trying. It seems working fine.

bondagit commented 6 months ago

Tnx. I'm trying. It seems working fine.

this patch is just a temporary fix as I cannot merge it into the driver main because we need to continue filtering the packets received by network interface name.

zophos commented 6 months ago

this patch is just a temporary fix as I cannot merge it into the driver main because we need to continue filtering the packets received by network interface name.

Yeah, I understand the situation. For now, I'll stop outputting debug messages and apply this patch. I'll try to come up with a solution too.

bondagit commented 6 months ago

For now, I'll stop outputting debug messages and apply this patch. I'll try to come up with a solution too.

we could just avoid filtering by interface in case the interface name received contains a non-ascii character. What do you think ? I see that at present filtering is disabled in case the interface name received is empty.

zophos commented 6 months ago

I see that at present filtering is disabled in case the interface name received is empty.

I think your idea is good for ad-hoc.

I haven't been able to track it down properly yet, but I have a feeling that ifname_used_ is not set to the correct name in kernel 6.8. But I'm drunk now, so that's all I can do today :P

zophos commented 6 months ago

I tried replacing MTAL_DP_INFO() with printk in EtherTubeNetfilter.c line 272, and as you said, I am getting non-ascii strings in ifname. And this is changing from time to time as shown below.

[96086.552417] 2: P\x91Ǐ^M\x92\xff\xff, enp2s0
[96086.591381] 2: P\x91Ǐ^M\x92\xff\xff, enp2s0
[96086.591421] 2: P\x91Ǐ^M\x92\xff\xff, enp2s0
[96086.681044] 2: P\xd1\xff\x8e^M\x92\xff\xff, enp2s0
[96086.801359] 2: P\x91Ǐ^M\x92\xff\xff, enp2s0
[96086.802275] 2: P\x91Ǐ^M\x92\xff\xff, enp2s0

It's a pain, but I think we need to follow the kernel's ChangeLog and check if the definition of struct net_device in that comes into netfilter has not changed.

Maybe we stepped on a nasty kernel bug X(

cedonik commented 6 months ago

Just disable the interface name check in the driver and it works as expected.

I can confirm that this workaround also helps on arch linux... Now it is locking!

Thanks for your effort! Cedo

bondagit commented 6 months ago

I implemented a path and added it to the driver version v1.9. According to my tests this works on all platforms.

zophos commented 6 months ago

I tried to use in->name_node->name instead of in->name in nf_hook_func() in module_interface.c, but this one is also broken.