Closed zajdee closed 1 year ago
The problem looks similar to #382, but when I cherry-picked 8980f79d0dc2d7f70dcfe48308a37bfe7e5b1cb2 and built Jool from the latest git source, I get some really strange len
/data_len
values:
[Wed Apr 19 19:51:19 2023] =================
[Wed Apr 19 19:51:19 2023] EPERM packet
[Wed Apr 19 19:51:19 2023] skb fields:
[Wed Apr 19 19:51:19 2023] prev:0000000000000000
[Wed Apr 19 19:51:19 2023] next:0000000000000000
[Wed Apr 19 19:51:19 2023] dev:00000000637f58f8
[Wed Apr 19 19:51:19 2023] len:3781078265
[Wed Apr 19 19:51:19 2023] data_len:471037066
[Wed Apr 19 19:51:19 2023] mac_len:0
[Wed Apr 19 19:51:19 2023] hdr_len:0
[Wed Apr 19 19:51:19 2023] truesize:768
[Wed Apr 19 19:51:19 2023] pkt_type:0
[Wed Apr 19 19:51:19 2023] ignore_df:0
[Wed Apr 19 19:51:19 2023] ip_summed:0 (CHECKSUM_NONE)
[Wed Apr 19 19:51:19 2023] csum_valid:0
[Wed Apr 19 19:51:19 2023] csum_start:0
[Wed Apr 19 19:51:19 2023] csum_offset:0
[Wed Apr 19 19:51:19 2023] mark:0
[Wed Apr 19 19:51:19 2023] inner_transport_header:0
[Wed Apr 19 19:51:19 2023] inner_network_header:0
[Wed Apr 19 19:51:19 2023] inner_mac_header:0
[Wed Apr 19 19:51:19 2023] protocol:2048 (IPv4)
[Wed Apr 19 19:51:19 2023] transport_header:118
[Wed Apr 19 19:51:19 2023] network_header:98
[Wed Apr 19 19:51:19 2023] mac_header:84
[Wed Apr 19 19:51:19 2023] head:00000000e16b2302
[Wed Apr 19 19:51:19 2023] data:84
[Wed Apr 19 19:51:19 2023] tail:158
[Wed Apr 19 19:51:19 2023] end:192
[Wed Apr 19 19:51:19 2023] Content:
[Wed Apr 19 19:51:19 2023] IPv4 header:
[Wed Apr 19 19:51:19 2023] [Truncated]
[Wed Apr 19 19:51:19 2023] shared info:
[Wed Apr 19 19:51:19 2023] nr_frags:0
[Wed Apr 19 19:51:19 2023] tx_flags:0
[Wed Apr 19 19:51:19 2023] gso_size:0
[Wed Apr 19 19:51:19 2023] gso_segs:0
[Wed Apr 19 19:51:19 2023] gso_type:0
[Wed Apr 19 19:51:19 2023] frags:
[Wed Apr 19 19:51:18 2023] Jool NAT64/95be8000/default: Sending packet.
[Wed Apr 19 19:51:18 2023] =================
[Wed Apr 19 19:51:18 2023] EPERM packet
[Wed Apr 19 19:51:18 2023] skb fields:
[Wed Apr 19 19:51:18 2023] prev:0000000000000000
[Wed Apr 19 19:51:18 2023] next:0000000000000000
[Wed Apr 19 19:51:18 2023] dev:00000000637f58f8
[Wed Apr 19 19:51:18 2023] len:1406893561
[Wed Apr 19 19:51:18 2023] data_len:483258680
[Wed Apr 19 19:51:18 2023] mac_len:0
[Wed Apr 19 19:51:18 2023] hdr_len:0
[Wed Apr 19 19:51:18 2023] truesize:824
[Wed Apr 19 19:51:18 2023] pkt_type:0
[Wed Apr 19 19:51:18 2023] ignore_df:0
[Wed Apr 19 19:51:18 2023] ip_summed:0 (CHECKSUM_NONE)
[Wed Apr 19 19:51:18 2023] csum_valid:0
[Wed Apr 19 19:51:18 2023] csum_start:0
[Wed Apr 19 19:51:18 2023] csum_offset:0
[Wed Apr 19 19:51:18 2023] mark:0
[Wed Apr 19 19:51:18 2023] inner_transport_header:0
[Wed Apr 19 19:51:18 2023] inner_network_header:0
[Wed Apr 19 19:51:18 2023] inner_mac_header:0
[Wed Apr 19 19:51:18 2023] protocol:2048 (IPv4)
[Wed Apr 19 19:51:18 2023] transport_header:118
[Wed Apr 19 19:51:18 2023] network_header:98
[Wed Apr 19 19:51:18 2023] mac_header:84
[Wed Apr 19 19:51:18 2023] head:000000004d09d961
[Wed Apr 19 19:51:18 2023] data:84
[Wed Apr 19 19:51:18 2023] tail:126
[Wed Apr 19 19:51:18 2023] end:192
[Wed Apr 19 19:51:18 2023] Content:
[Wed Apr 19 19:51:18 2023] IPv4 header:
[Wed Apr 19 19:51:18 2023] Version: 4
[Wed Apr 19 19:51:18 2023] IHL: 5
[Wed Apr 19 19:51:18 2023] TOS: 0
[Wed Apr 19 19:51:18 2023] Total Length: 84
[Wed Apr 19 19:51:18 2023] Fragment ID: 60892
[Wed Apr 19 19:51:18 2023] Reserved:0 DF:0 MF:0 FragmentOffset: 0
[Wed Apr 19 19:51:18 2023] TTL: 62
[Wed Apr 19 19:51:18 2023] Protocol: 1
[Wed Apr 19 19:51:18 2023] Checksum: 0x3ccb
[Wed Apr 19 19:51:18 2023] Source Address: 192.0.2.1
[Wed Apr 19 19:51:18 2023] Destination Address: 85.207.58.49
[Wed Apr 19 19:51:18 2023] ICMPv4 header:
[Wed Apr 19 19:51:18 2023] Type:8 Code:0
[Wed Apr 19 19:51:18 2023] Checksum: 0xc7df
[Wed Apr 19 19:51:18 2023] Rest 1: 64816
[Wed Apr 19 19:51:18 2023] Rest 2: 64
[Wed Apr 19 19:51:18 2023] Payload:
[Wed Apr 19 19:51:18 2023] ff ff ff 00 7e (...)
[Wed Apr 19 19:51:18 2023] [Truncated]
[Wed Apr 19 19:51:18 2023] shared info:
[Wed Apr 19 19:51:18 2023] nr_frags:1
[Wed Apr 19 19:51:18 2023] tx_flags:0
[Wed Apr 19 19:51:18 2023] gso_size:0
[Wed Apr 19 19:51:18 2023] gso_segs:0
[Wed Apr 19 19:51:18 2023] gso_type:0
[Wed Apr 19 19:51:18 2023] frags:
[Wed Apr 19 19:51:18 2023] 0 page_offset:2110 size:56
[Wed Apr 19 19:53:23 2023] Jool NAT64/95be8000/default: Sending packet.
[Wed Apr 19 19:53:23 2023] =================
[Wed Apr 19 19:53:23 2023] EPERM packet
[Wed Apr 19 19:53:23 2023] skb fields:
[Wed Apr 19 19:53:23 2023] prev:0000000000000000
[Wed Apr 19 19:53:23 2023] next:0000000000000000
[Wed Apr 19 19:53:23 2023] dev:00000000637f58f8
[Wed Apr 19 19:53:23 2023] len:1450223609
[Wed Apr 19 19:53:23 2023] data_len:471620157
[Wed Apr 19 19:53:23 2023] mac_len:0
[Wed Apr 19 19:53:23 2023] hdr_len:0
[Wed Apr 19 19:53:23 2023] truesize:768
[Wed Apr 19 19:53:23 2023] pkt_type:0
[Wed Apr 19 19:53:23 2023] ignore_df:0
[Wed Apr 19 19:53:23 2023] ip_summed:0 (CHECKSUM_NONE)
[Wed Apr 19 19:53:23 2023] csum_valid:0
[Wed Apr 19 19:53:23 2023] csum_start:0
[Wed Apr 19 19:53:23 2023] csum_offset:0
[Wed Apr 19 19:53:23 2023] mark:0
[Wed Apr 19 19:53:23 2023] inner_transport_header:0
[Wed Apr 19 19:53:23 2023] inner_network_header:0
[Wed Apr 19 19:53:23 2023] inner_mac_header:0
[Wed Apr 19 19:53:23 2023] protocol:2048 (IPv4)
[Wed Apr 19 19:53:23 2023] transport_header:118
[Wed Apr 19 19:53:23 2023] network_header:98
[Wed Apr 19 19:53:23 2023] mac_header:84
[Wed Apr 19 19:53:23 2023] head:0000000037ef5405
[Wed Apr 19 19:53:23 2023] data:84
[Wed Apr 19 19:53:23 2023] tail:158
[Wed Apr 19 19:53:23 2023] end:192
[Wed Apr 19 19:53:23 2023] Content:
[Wed Apr 19 19:53:23 2023] IPv4 header:
[Wed Apr 19 19:53:23 2023] Version: 4
[Wed Apr 19 19:53:23 2023] IHL: 5
[Wed Apr 19 19:53:23 2023] TOS: 0
[Wed Apr 19 19:53:23 2023] Total Length: 60
[Wed Apr 19 19:53:23 2023] Fragment ID: 9524
[Wed Apr 19 19:53:23 2023] Reserved:0 DF:0 MF:0 FragmentOffset: 0
[Wed Apr 19 19:53:23 2023] TTL: 62
[Wed Apr 19 19:53:23 2023] Protocol: 6
[Wed Apr 19 19:53:23 2023] Checksum: 0x587
[Wed Apr 19 19:53:23 2023] Source Address: 192.0.2.1
[Wed Apr 19 19:53:23 2023] Destination Address: 85.207.58.49
[Wed Apr 19 19:53:23 2023] TCP header:
[Wed Apr 19 19:53:23 2023] Src Port: 64566
[Wed Apr 19 19:53:23 2023] Dst Port: 443
[Wed Apr 19 19:53:23 2023] Seq Number: 940432038
[Wed Apr 19 19:53:23 2023] ACK Seq: 0
[Wed Apr 19 19:53:23 2023] ACK:0 RST:0 SYN:1 FIN:0
[Wed Apr 19 19:53:23 2023] [Other flags ommitted]
[Wed Apr 19 19:53:23 2023] Window Size: 42600
[Wed Apr 19 19:53:23 2023] Checksum: 0x42a9
[Wed Apr 19 19:53:23 2023] Urgent Pointer: 0
[Wed Apr 19 19:53:23 2023] Payload:
[Wed Apr 19 19:53:23 2023] 02 04 05 8c 04 (...)
[Wed Apr 19 19:53:23 2023] [Truncated]
[Wed Apr 19 19:53:23 2023] shared info:
[Wed Apr 19 19:53:23 2023] nr_frags:0
[Wed Apr 19 19:53:23 2023] tx_flags:0
[Wed Apr 19 19:53:23 2023] gso_size:0
[Wed Apr 19 19:53:23 2023] gso_segs:0
[Wed Apr 19 19:53:23 2023] gso_type:0
[Wed Apr 19 19:53:23 2023] frags:
Is there any known issue with these newer kernels? I have checked and there is no newer version of Jool available on Jool.mx.
Well, for one thing, jool.mx is now a complete liability. I lost control on it about a year ago. Please use the Github version instead and update your bookmarks (if any).
The latest Jool is 4.1.9, not 4.1.7. Looking at the compatibility table, 4.1.7 is indeed not necessarily expected to run properly on kernel 5.19. Might want to try updating first.
@ydahhrk thank you. It might not have been clear from my last message, however I have already tried building the code from the latest git master + the cherry-picked commit to print packet details. It still misbehaves.
Ah. Sorry about that.
Well, reading #382 again, we stopped looking at those packet lengths because 8980f79d0dc2d7f70dcfe48308a37bfe7e5b1cb2 had a bug I later patched in 492e4627bf91921c769a942b641dcb318b739f30, which was (presumably) causing them to break. (Although it IS very strange that they're the only packet fields that break.) But either way, I'd recommend sticking to the latest commit of the debug branch, rather than cherry-picking individual debug commits.
But, assuming those packet lengths are still breaking... well, I haven't been able to find where this is happening, but it should be relatively easy to pinpoint by monitoring them.
I just uploaded a commit (to the same debug branch) that checks the length in a bunch of locations. Try getting the error, then provide lines from dmesg
that look somewhat like this:
[ 634.533214] /home/al/git/jool/src/mod/common/rfc7915/6to4.c:536 (ttp64_alloc_skb()): skb length is 1450223609.
[ 634.538311] /home/al/git/jool/src/mod/common/rfc7915/6to4.c:536 (ttp64_alloc_skb()): skb data_len is 471620157.
These lines are printed as soon as the code detects the packet lengths have been corrupted. Each errored packet should only print each line once, so it shouldn't flood the logs too badly.
I have built the code in the issue382
branch, installed it, rmmod'd/modprobed the Jool modules and re-initialized the configuration, the error is still there, but there's neither skb length
nor skb data_len
in my dmesg
output. That's weird.
root@router:/home/work/Jool# dkms status
jool/4.1.9.git.v4.1.9.6.g7199f357, 5.19.0-38-generic, x86_64: installed
But now I'm getting EPERM in
as well as EPERM out
errors.
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: ===============================================
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: Packet: <lan-v6-prefix>::6->fd00:64::55cf:3a31
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: TCP 38242->443
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: Step 1: Determining the Incoming Tuple
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: Tuple: <lan-v6-prefix>::6#38242 -> fd00:64::55cf:3a31#443 (TCP)
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: Done step 1.
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: Step 2: Filtering and Updating
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: BIB entry: <lan-v6-prefix>::6#38242 - 192.0.2.1#64374 (TCP)
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: Session entry: <lan-v6-prefix>::6#38242 - fd00:64::55cf:3a31#443 | 192.0.2.1#64374 - 85.207.58.49#443 (TCP)
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: Done: Step 2.
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: Step 3: Computing the Outgoing Tuple
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: Tuple: 192.0.2.1#64374 -> 85.207.58.49#443 (TCP)
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: Done step 3.
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: Step 4: Translating the Packet
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: Routing: 192.0.2.1->85.207.58.49
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: Packet routed via device 'from_jool'.
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: Done step 4.
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: Sending packet.
[Sat Apr 22 21:21:23 2023] =================
[Sat Apr 22 21:21:23 2023] EPERM in packet
[Sat Apr 22 21:21:23 2023] skb fields:
[Sat Apr 22 21:21:23 2023] prev:0000000000000000
[Sat Apr 22 21:21:23 2023] next:0000000000000000
[Sat Apr 22 21:21:23 2023] dev:00000000637f58f8
[Sat Apr 22 21:21:23 2023] len:80
[Sat Apr 22 21:21:23 2023] data_len:0
[Sat Apr 22 21:21:23 2023] mac_len:14
[Sat Apr 22 21:21:23 2023] hdr_len:0
[Sat Apr 22 21:21:23 2023] truesize:3328
[Sat Apr 22 21:21:23 2023] pkt_type:0
[Sat Apr 22 21:21:23 2023] ignore_df:0
[Sat Apr 22 21:21:23 2023] ip_summed:1 (CHECKSUM_UNNECESSARY)
[Sat Apr 22 21:21:23 2023] csum_valid:0
[Sat Apr 22 21:21:23 2023] csum_start:0
[Sat Apr 22 21:21:23 2023] csum_offset:0
[Sat Apr 22 21:21:23 2023] mark:0
[Sat Apr 22 21:21:23 2023] inner_transport_header:0
[Sat Apr 22 21:21:23 2023] inner_network_header:0
[Sat Apr 22 21:21:23 2023] inner_mac_header:0
[Sat Apr 22 21:21:23 2023] protocol:34525 (IPv6)
[Sat Apr 22 21:21:23 2023] transport_header:118
[Sat Apr 22 21:21:23 2023] network_header:78
[Sat Apr 22 21:21:23 2023] mac_header:64
[Sat Apr 22 21:21:23 2023] head:000000000e388681
[Sat Apr 22 21:21:23 2023] data:78
[Sat Apr 22 21:21:23 2023] tail:158
[Sat Apr 22 21:21:23 2023] end:704
[Sat Apr 22 21:21:23 2023] Content:
[Sat Apr 22 21:21:23 2023] IPv6 header:
[Sat Apr 22 21:21:23 2023] Version: 6
[Sat Apr 22 21:21:23 2023] Priority: 0
[Sat Apr 22 21:21:23 2023] Flow Label: 0x0faf24
[Sat Apr 22 21:21:23 2023] Payload Length: 40
[Sat Apr 22 21:21:23 2023] Next Header: 6
[Sat Apr 22 21:21:23 2023] Hop Limit: 63
[Sat Apr 22 21:21:23 2023] Source Address: <lan-v6-prefix>::6
[Sat Apr 22 21:21:23 2023] Destination Address: fd00:64::55cf:3a31
[Sat Apr 22 21:21:23 2023] TCP header:
[Sat Apr 22 21:21:23 2023] Src Port: 38242
[Sat Apr 22 21:21:23 2023] Dst Port: 443
[Sat Apr 22 21:21:23 2023] Seq Number: 860452114
[Sat Apr 22 21:21:23 2023] ACK Seq: 0
[Sat Apr 22 21:21:23 2023] ACK:0 RST:0 SYN:1 FIN:0
[Sat Apr 22 21:21:23 2023] [Other flags ommitted]
[Sat Apr 22 21:21:23 2023] Window Size: 42960
[Sat Apr 22 21:21:23 2023] Checksum: 0x9247
[Sat Apr 22 21:21:23 2023] Urgent Pointer: 0
[Sat Apr 22 21:21:23 2023] Payload:
[Sat Apr 22 21:21:23 2023] 02 04 05 98 04 (...) 00 01 03 03 0a
[Sat Apr 22 21:21:23 2023] shared info:
[Sat Apr 22 21:21:23 2023] nr_frags:0
[Sat Apr 22 21:21:23 2023] tx_flags:0
[Sat Apr 22 21:21:23 2023] gso_size:0
[Sat Apr 22 21:21:23 2023] gso_segs:0
[Sat Apr 22 21:21:23 2023] gso_type:0
[Sat Apr 22 21:21:23 2023] frags:
[Sat Apr 22 21:21:23 2023] out packet index: 0
[Sat Apr 22 21:21:23 2023] state flags: 1
[Sat Apr 22 21:21:23 2023] =================
[Sat Apr 22 21:21:23 2023] EPERM out packet
[Sat Apr 22 21:21:23 2023] skb fields:
[Sat Apr 22 21:21:23 2023] prev:0000000000000000
[Sat Apr 22 21:21:23 2023] next:0000000000000000
[Sat Apr 22 21:21:23 2023] dev:00000000637f58f8
[Sat Apr 22 21:21:23 2023] len:60
[Sat Apr 22 21:21:23 2023] data_len:0
[Sat Apr 22 21:21:23 2023] mac_len:0
[Sat Apr 22 21:21:23 2023] hdr_len:0
[Sat Apr 22 21:21:23 2023] truesize:768
[Sat Apr 22 21:21:23 2023] pkt_type:0
[Sat Apr 22 21:21:23 2023] ignore_df:0
[Sat Apr 22 21:21:23 2023] ip_summed:0 (CHECKSUM_NONE)
[Sat Apr 22 21:21:23 2023] csum_valid:0
[Sat Apr 22 21:21:23 2023] csum_start:0
[Sat Apr 22 21:21:23 2023] csum_offset:0
[Sat Apr 22 21:21:23 2023] mark:0
[Sat Apr 22 21:21:23 2023] inner_transport_header:0
[Sat Apr 22 21:21:23 2023] inner_network_header:0
[Sat Apr 22 21:21:23 2023] inner_mac_header:0
[Sat Apr 22 21:21:23 2023] protocol:2048 (IPv4)
[Sat Apr 22 21:21:23 2023] transport_header:118
[Sat Apr 22 21:21:23 2023] network_header:98
[Sat Apr 22 21:21:23 2023] mac_header:98
[Sat Apr 22 21:21:23 2023] head:000000006ca4627b
[Sat Apr 22 21:21:23 2023] data:98
[Sat Apr 22 21:21:23 2023] tail:158
[Sat Apr 22 21:21:23 2023] end:192
[Sat Apr 22 21:21:23 2023] Content:
[Sat Apr 22 21:21:23 2023] IPv4 header:
[Sat Apr 22 21:21:23 2023] Version: 4
[Sat Apr 22 21:21:23 2023] IHL: 5
[Sat Apr 22 21:21:23 2023] TOS: 0
[Sat Apr 22 21:21:23 2023] Total Length: 60
[Sat Apr 22 21:21:23 2023] Fragment ID: 48814
[Sat Apr 22 21:21:23 2023] Reserved:0 DF:0 MF:0 FragmentOffset: 0
[Sat Apr 22 21:21:23 2023] TTL: 62
[Sat Apr 22 21:21:23 2023] Protocol: 6
[Sat Apr 22 21:21:23 2023] Checksum: 0x6c0c
[Sat Apr 22 21:21:23 2023] Source Address: 192.0.2.1
[Sat Apr 22 21:21:23 2023] Destination Address: 85.207.58.49
[Sat Apr 22 21:21:23 2023] TCP header:
[Sat Apr 22 21:21:23 2023] Src Port: 64374
[Sat Apr 22 21:21:23 2023] Dst Port: 443
[Sat Apr 22 21:21:23 2023] Seq Number: 860452114
[Sat Apr 22 21:21:23 2023] ACK Seq: 0
[Sat Apr 22 21:21:23 2023] ACK:0 RST:0 SYN:1 FIN:0
[Sat Apr 22 21:21:23 2023] [Other flags ommitted]
[Sat Apr 22 21:21:23 2023] Window Size: 42960
[Sat Apr 22 21:21:23 2023] Checksum: 0x96fe
[Sat Apr 22 21:21:23 2023] Urgent Pointer: 0
[Sat Apr 22 21:21:23 2023] Payload:
[Sat Apr 22 21:21:23 2023] 02 04 05 98 04 (...) 00 01 03 03 0a
[Sat Apr 22 21:21:23 2023] shared info:
[Sat Apr 22 21:21:23 2023] nr_frags:0
[Sat Apr 22 21:21:23 2023] tx_flags:0
[Sat Apr 22 21:21:23 2023] gso_size:0
[Sat Apr 22 21:21:23 2023] gso_segs:0
[Sat Apr 22 21:21:23 2023] gso_type:0
[Sat Apr 22 21:21:23 2023] frags:
[Sat Apr 22 21:21:23 2023] Jool NAT64/95be8000/default: dst_output() returned errcode 1.
[Sat Apr 22 21:21:23 2023] Jool: Dropping packet.
Now I have tried to update the NAT64 prefix to 64:ff9b::/96
, but with no luck, the result is still the same.
root@router:/home/work/Jool# jool global display
manually-enabled: true
pool6: 64:ff9b::/96
lowest-ipv6-mtu: 1500
logging-debug: false
zeroize-traffic-class: false
override-tos: false
tos: 0
mtu-plateaus: 65535,32000,17914,8166,4352,2002,1492,1006,508,296,68
address-dependent-filtering: false
drop-externally-initiated-tcp: false
drop-icmpv6-info: false
source-icmpv6-errors-better: true
f-args: 11 (0b1011): SrcAddr:1 SrcPort:0 DstAddr:1 DstPort:1
handle-rst-during-fin-rcv: false
tcp-est-timeout: 2:00:00 (HH:MM:SS)
tcp-trans-timeout: 0:04:00 (HH:MM:SS)
udp-timeout: 0:05:00 (HH:MM:SS)
icmp-timeout: 0:01:00 (HH:MM:SS)
logging-bib: false
logging-session: false
maximum-simultaneous-opens: 10
ss-enabled: false
ss-flush-asap: true
ss-flush-deadline: 2000
ss-capacity: 512
ss-max-payload: 1452
Just letting you know: It seems I'm now reproducing the problem reliably. I'll need to compile a custom kernel, so the patch might take a day, but it's probably coming.
tc qdisc add dev from_jool root fq
Wait. What is this? Everything works when I remove it.
Well... here's what I gathered:
It seems the "Fair Queue" packet scheduler is giving up here. And that would be because Jool's IPv4 outgoing packet timestamp (tstamp
) is comically large. This comparison is giving me numbers that tend to look like
1682383029042469928 > 217308362912 + 10000000000
And the reason why Jool's IPv4 outgoing packet timestamp is very large is... because the incoming IPv6 packet timestamp is very large. Jool simply copies that number.
So I'm very lost.
Does any of this make sense to you?
Thank you for your detailed investigation. I only had time to follow up on this issue today. I do confirm that when I use a different qdisc
than fq
, the traffic immediately starts flowing.
The horizon comparison was not in earlier kernels. It seems to have appeared in https://github.com/torvalds/linux/commit/39d010504e6b4485d7ceee167743620dd33f4417
From what I see in your post and in that kernel commit, 1682383029042469928
in your case seems to be a nanosecond-precision epoch timestamp. 1682383029
was Tuesday 25. April 2023 0:37:09
, likely the date and time of your debugging session. 10000000000
is the default 10 seconds FQ horizon and 217308362912
is the result of ktime_get_ns
, which is a nanosecond-precision timer that starts at kernel boot.
It doesn't make any sense to me to compare epoch timestamp to the kernel timer. That comparison will never succeed. skb
tstamp
seems to be defined as ktime_t
(https://github.com/torvalds/linux/blame/master/include/linux/skbuff.h#L870), it therefore doesn't make sense to set it to fill it with epoch timestamp.
Jool simply copies that number.
This sounds interesting. From where does Jool take the timestamp, or in which function does it copy it over (I may be bad at grepping, but I couldn't find it)? If it's provided by the kernel, then perhaps there's a bug in the kernel code somewhere.
I suppose the IPv4 timestamp uses a different epoch than the IPv6 timestamp. I'll try printing a few, see if the pattern is consistent.
From where does Jool take the timestamp, or in which function does it copy it over?
Here. Specifically, pskb_copy
> __pskb_copy
> __pskb_copy_fclone
> skb_copy_header
> __copy_skb_header
.
Theory doesn't stand:
$ ping -c2 fd00:64::192.0.2.8
$ ping -c2 192.0.2.1
$ sudo dmesg
[ 315.428181] IPv6 tstamp:1683060731253048982
[ 316.428485] IPv6 tstamp:1683060732253852448
[ 322.955888] IPv4 tstamp:1683060738784521621
[ 323.976784] IPv4 tstamp:1683060739805925360
This patch series implies skb->tstamp
is used for two different things in ingress and egress, which might account for the wildly different values seen?
Maybe simply clearing the timestamp (out->tstamp = 0
) after pskb_copy
is good enough? Or maybe that's inefficient and more sophisticated handling is required?
maybe that's inefficient and more sophisticated handling is required?
actually skb_clear_tstamp(out)
introduced in the referenced patch series might be exactly what is needed, but only available in Linux 5.18+
This patch series implies skb->tstamp is used for two different things in ingress and egress
actually
skb_clear_tstamp(out)
introduced in the referenced patch series might be exactly what is needed, but only available in Linux 5.18+
Makes sense. Previous to 5.18, both of the kernel's forwarding functions employed skb->tstamp = 0
to clean up the field, and 5.18 built upon it:
So yes, it would seem the kernel is expecting Jool to do the same.
Maybe simply clearing the timestamp (
out->tstamp = 0
) afterpskb_copy
is good enough? Or maybe that's inefficient and more sophisticated handling is required?
It seems to me the whole reason why the author of the patch states it's inefficient is because it breaks shd_fq
. Allegedly. Given that you people want FQ, it suggests skb->tstamp = 0
might not the best solution for you.
Then again, skb_clear_tstamp()
might have emerged because of complications resulting from the 5.18 refactors, not long-standing bugs. This would mean skb->tstamp = 0
is perfectly acceptable for older kernels.
I uploaded the patch to branch issue400. Might want to try it out.
It now works in my Debian 11, kernel 5.10, with FQ enabled. (Didn't before the patch.) I don't know if it compiles in the other kernels; will test that tomorrow.
I uploaded the patch to branch issue400. Might want to try it out.
It works for me on RHEL 8.7 (Kernel 4.18.0-425.19.2.el8_7.x86_64), thanks!
It seems to me the whole reason why the author of the patch states it's inefficient is because it breaks
shd_fq
. Allegedly. Given that you people want FQ, it suggestsskb->tstamp = 0
might not the best solution for you.
My interpretation is it is failure to clear tstamp at all that causes shd_fq
confusion - as we see here.
The ineffeciency was from setting tstamp = 0 when it doesn't need to be reset, ie. it is already on an egress path and contains a delivery_time type of timestamp, causing unnecessary recalculations. The patchset adds the necessary logic to avoid resetting in those cases, and skb_clear_tstamp()
is the way to leverage that logic for better efficiency on newer kernels.
So I don't agree that this "might not the best solution".
I do confirm that the patched version from the issue400
branch works well for me with fq
in place (Ubuntu 22.10, 5.19.0-38-generic
).
Guess it's ready for release then.
4.1.10 released; closing.
I am running a Jool.mx Netfilter instance in a separate namespace. This configuration was working for several years, but suddenly it is unable to translate packets. I have tested self-built DKMS module as well as the prebuilt 4.1.7 Debian packages, still no luck.
The error I am seeing in
dmesg
when enablingjool global update logging-debug true
is:The full packet dump from
dmesg
is:The setup: [IPv6, internal NAT64-enabled LAN] -> [IPv6, router main namespace, LAN port] -> [IPv6, namespace joolns] -> [jool] -> [IPv4, namespace joolns] -> [IPv4, router main namespace, NAT to the egress interface] -> [the Internet]
I see the IPv6 packets incoming in the
joolns
namespace, but the IPv4 packets never appear leaving the namespace (the in/out interface is just one).The init script:
(Outgoing NAT on the router is handled outside of this script.)
Interface config within the namespace:
Jool status:
I have tried setting
rp_filter
to 0, but I still have no luck and packets get dropped.uname -a
:Is there any known issue with these newer kernels? I have checked and there is no newer version of Jool available on Jool.mx.
Thank you.