NICMx / Jool

SIIT and NAT64 for Linux
GNU General Public License v2.0
331 stars 66 forks source link

Crash in find_bib_session6.constprop on 4.1.13 #426

Open mweinelt opened 1 month ago

mweinelt commented 1 month ago

Hi!

We used jool recently for 464XLAT with Android clients exclusively. They managed to trigger the following crash:

Oct 03 18:05:06 router kernel: ------------[ cut here ]------------
Oct 03 18:05:06 router kernel: BIB entry was and then wasn't in the v4 tree.
Oct 03 18:05:06 router kernel: WARNING: CPU: 1 PID: 0 at /build/source/src/mod/common/db/bib/db.c:1392 find_bib_session6.constprop.0+0x765/0x7b0 [jool_common]
Oct 03 18:05:06 router kernel: Modules linked in: sctp ip6_udp_tunnel udp_tunnel nf_conntrack_netlink jool(O) jool_common(O) af_packet dummy cfg80211 rfkill nft_nat nft_chain_nat nf_nat nft_limit nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables libcrc32c nls_iso8859_1 nls_cp437 vfat fat snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec iTCO_wdt intel_pmc_bxt snd_hda_core watchdog sha512_ssse3 snd_hwdep sha256_ssse3 sha1_ssse3 snd_pcm i2c_i801 bochs snd_timer psmouse drm_vram_helper i2c_smbus drm_ttm_helper lpc_ich snd soundcore ttm intel_agp intel_gtt tiny_power_button evdev vmgenid joydev mousedev input_leds button led_class mac_hid serio_raw sch_fq_codel loop tun tap macvlan bridge stp llc fuse configfs efi_pstore nfnetlink efivarfs dmi_sysfs qemu_fw_cfg autofs4 ext4 crc32c_generic crc16 mbcache jbd2 sd_mod t10_pi sr_mod hid_generic cdrom usbhid crc64_rocksoft crc64 crc_t10dif hid crct10dif_generic crct10dif_common ahci libahci libata virtio_net net_failover failover virtio_scsi atkbd scsi_mod libps2
Oct 03 18:05:06 router kernel:  vivaldi_fmap crc32c_intel scsi_common uhci_hcd ehci_pci virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev ehci_hcd i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring
Oct 03 18:05:06 router kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           O       6.6.52 #1-NixOS
Oct 03 18:05:06 router kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 4.2023.08-4 02/15/2024
Oct 03 18:05:06 router kernel: RIP: 0010:find_bib_session6.constprop.0+0x765/0x7b0 [jool_common]
Oct 03 18:05:06 router kernel: Code: 28 70 c1 49 8b 07 48 89 44 24 10 e9 c9 fc ff ff 48 c7 c7 e0 da ce c0 48 89 4c 24 18 4c 89 44 24 10 4c 89 0c 24 e8 bb ac 3f c1 <0f> 0b 48 8b 4c 24 18 4c 8b 44 24 10 4c 8b 0c 24 eb 92 b9 ef ff ff
Oct 03 18:05:06 router kernel: RSP: 0018:ffffc90000134948 EFLAGS: 00010246
Oct 03 18:05:06 router kernel: RAX: 0000000000000000 RBX: ffffc90000134a40 RCX: 0000000000000000
Oct 03 18:05:06 router kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Oct 03 18:05:06 router kernel: RBP: ffff888106853608 R08: 0000000000000000 R09: 0000000000000000
Oct 03 18:05:06 router kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff888107079888
Oct 03 18:05:06 router kernel: R13: ffff888101fc1b00 R14: ffffc90000134a30 R15: ffffc90000134a20
Oct 03 18:05:06 router kernel: FS:  0000000000000000(0000) GS:ffff88817ba80000(0000) knlGS:0000000000000000
Oct 03 18:05:06 router kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 03 18:05:06 router kernel: CR2: 000055cc8a219204 CR3: 00000001286b2000 CR4: 00000000000006e0
Oct 03 18:05:06 router kernel: Call Trace:
Oct 03 18:05:06 router kernel:  <IRQ>
Oct 03 18:05:06 router kernel:  ? find_bib_session6.constprop.0+0x765/0x7b0 [jool_common]
Oct 03 18:05:06 router kernel:  ? __warn+0x81/0x130
Oct 03 18:05:06 router kernel:  ? find_bib_session6.constprop.0+0x765/0x7b0 [jool_common]
Oct 03 18:05:06 router kernel:  ? report_bug+0x182/0x1b0
Oct 03 18:05:06 router kernel:  ? handle_bug+0x42/0x90
Oct 03 18:05:06 router kernel:  ? exc_invalid_op+0x17/0x80
Oct 03 18:05:06 router kernel:  ? asm_exc_invalid_op+0x1a/0x20
Oct 03 18:05:06 router kernel:  ? find_bib_session6.constprop.0+0x765/0x7b0 [jool_common]
Oct 03 18:05:06 router kernel:  ? find_bib_session6.constprop.0+0x765/0x7b0 [jool_common]
Oct 03 18:05:06 router kernel:  bib_add_tcp6+0xf5/0x320 [jool_common]
Oct 03 18:05:06 router kernel:  filtering_and_updating+0x60f/0x630 [jool_common]
Oct 03 18:05:06 router kernel:  ? __pfx_tcp_state_machine+0x10/0x10 [jool_common]
Oct 03 18:05:06 router kernel:  ? determine_in_tuple+0x60/0x8d0 [jool_common]
Oct 03 18:05:06 router kernel:  core_common+0x38/0x130 [jool_common]
Oct 03 18:05:06 router kernel:  core_6to4+0x80/0xe0 [jool_common]
Oct 03 18:05:06 router kernel:  hook_ipv6+0x5f/0x80 [jool_common]
Oct 03 18:05:06 router kernel:  nf_hook_slow+0x45/0xd0
Oct 03 18:05:06 router kernel:  nf_hook_slow_list+0x95/0x140
Oct 03 18:05:06 router kernel:  ip6_sublist_rcv+0x2d2/0x300
Oct 03 18:05:06 router kernel:  ? __pfx_ip6_rcv_finish+0x10/0x10
Oct 03 18:05:06 router kernel:  ipv6_list_rcv+0x13f/0x180
Oct 03 18:05:06 router kernel:  __netif_receive_skb_list_core+0x1f5/0x2d0
Oct 03 18:05:06 router kernel:  netif_receive_skb_list_internal+0x1bb/0x300
Oct 03 18:05:06 router kernel:  napi_complete_done+0x72/0x1c0
Oct 03 18:05:06 router kernel:  virtnet_poll+0x3e5/0x580 [virtio_net]
Oct 03 18:05:06 router kernel:  __napi_poll+0x2b/0x1c0
Oct 03 18:05:06 router kernel:  net_rx_action+0x2b1/0x390
Oct 03 18:05:06 router kernel:  handle_softirqs+0xe5/0x2f0
Oct 03 18:05:06 router kernel:  __irq_exit_rcu+0xb7/0xd0
Oct 03 18:05:06 router kernel:  common_interrupt+0x86/0xa0
Oct 03 18:05:06 router kernel:  </IRQ>
Oct 03 18:05:06 router kernel:  <TASK>
Oct 03 18:05:06 router kernel:  asm_common_interrupt+0x26/0x40
Oct 03 18:05:06 router kernel: RIP: 0010:pv_native_safe_halt+0xf/0x20
Oct 03 18:05:06 router kernel: Code: 0b 90 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 73 7b 3c 00 fb f4 <c3> cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90
Oct 03 18:05:06 router kernel: RSP: 0018:ffffc900000c7ed8 EFLAGS: 00000246
Oct 03 18:05:06 router kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
Oct 03 18:05:06 router kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Oct 03 18:05:06 router kernel: RBP: ffff8881003f2080 R08: 0000000000000000 R09: 0000000000000000
Oct 03 18:05:06 router kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Oct 03 18:05:06 router kernel: R13: 0000000000000000 R14: ffff8881003f2080 R15: 0000000000000000
Oct 03 18:05:06 router kernel:  default_idle+0x9/0x30
Oct 03 18:05:06 router kernel:  default_idle_call+0x2c/0xe0
Oct 03 18:05:06 router kernel:  do_idle+0x1f1/0x230
Oct 03 18:05:06 router kernel:  cpu_startup_entry+0x2a/0x30
Oct 03 18:05:06 router kernel:  start_secondary+0x11e/0x140
Oct 03 18:05:06 router kernel:  secondary_startup_64_no_verify+0x18f/0x19b
Oct 03 18:05:06 router kernel:  </TASK>
Oct 03 18:05:06 router kernel: ---[ end trace 0000000000000000 ]---
{
  "framework": "netfilter",
  "global": {
    "pool6": "64:ff9b::/96"
  },
  "instance": "default",
  "pool4": [
    {
      "port range": "10000-65535",
      "prefix": "82.195.87.128/28",
      "protocol": "TCP"
    },
    {
      "port range": "10000-65535",
      "prefix": "82.195.87.128/28",
      "protocol": "UDP"
    },
    {
      "port range": "10000-65535",
      "prefix": "82.195.87.128/28",
      "protocol": "ICMP"
    }
  ]
}

Jool 4.1.13 Kernel: 6.6.52 Distro: NixOS

ydahhrk commented 1 month ago

Hey. Sorry about the wait.

This one looks difficult. Might take a while to find.

Couple of notes:

1

This is not a crash; it's a warning. It's supposed to be an impossible situation, but Jool recovers anyway. The worst that should happen is the packet gets discarded. (And that mess of text in the logs.)

Was your kernel really unusable after this?

2

This is the hole punching code.

I've never seen anyone talk about it, so I don't think anyone's using it, at least consciously. I implemented it so long ago, it might as well have been broken by some intrusive refactor over the years.

Are you really trying to punch a hole through Jool? If you don't care about it, you can turn it off:

jool global update maximum-simultaneous-opens 0

or

  "global": {
    "pool6": "64:ff9b::/96",
    "maximum-simultaneous-opens": 0
  },
mweinelt commented 1 month ago

Was your kernel really unusable after this?

Yes, the VM was not forwarding packets any longer, and I was unable to SSH in. Needed a hard reset.

Are you really trying to punch a hole through Jool? If you don't care about it, you can turn it off:

No, we were not explicitly trying to do hole punching. I can give that a shot, but it will take until 2025-09 until I get another chance to deploy that setup.

ydahhrk commented 1 month ago

Yes, the VM was not forwarding packets any longer, and I was unable to SSH in. Needed a hard reset.

Drat. This is two bugs, then.


This is altogether weird. As far as I can tell, hole punching shouldn't even work on Jool, because it doesn't do TCP port preservation at all.

It seems this code isn't really doing anything. (Apart from crashing, that is.) It's looking like the right course of action would be to just delete it.

Or implement configurable port preservation somehow. Except no one has ever requested it.

ydahhrk commented 1 month ago

(Just thinking out loud.)

Bit of background:

The session members are called src6, dst6, src4 and dst4. They're named after the packet fields in the IPv6 -> IPv4 direction.

In TCP, dst6 always equals pool6 + dst4. (eg. [64:ff9b::192.0.2.1]:8080 = 64:ff9b::/96 + 192.0.2.1:8080)

The names make less sense IPv4 -> IPv6 direction, as they seem inverted:

But that's something RFC 6146 seems happy to live with.

Jool's current hole punching algorithm (which, according to my notes, involved a bunch of guesswork) is

  1. An unknown IPv4 TCP packet arrives. ("Unknown" means that Jool has no state for it.)
  2. Jool stores the known fields (dst6, src4 and dst4) in a dedicated table for 6 seconds.
  3. First IPv6 TCP packet that doesn't have state, arrives within 6 seconds, and matches the stored dst6 claims the connection.

So even though Jool doesn't do port preservation between src6 and src4 by itself, punching a hole is still possible because the IPv4 endpoint actually gets to decide the value of src4's port. (In practice, it'll probably always choose the same one as src6.)

Hmmmmmmmmm.