Open jamercee opened 5 years ago
Followup -- two systems that both used comments have had kernel panics within 24-hours of starting to use comments. Both systems have been up and stable for more than 2-years prior to these panics. This was the only change made to both of these systems within the preceding 24-hours, so it seems likely they were caused by using ipset's with comments.
Given that comments simply allocate memory, unless you crashed while adding or deleting a rule, this is incredibly unlikely.
Either way, this repository is not reflective of the mainline kernel development, you have not stated what version of kernel you are using nor any sort of stack trace or kernel panic information.
Without any of the above, do not expect help.
It may be entirely circumstantial but both systems crashed within 24-hours of starting to use ipsets with comments. Each had uptime in excess of 2-years and neither had ever crashed before. Both systems were successfully using ipsets without comments for several years and the only change made to these systems was the introduction of one new set on each that included comments. Both systems had approximately 1,200 entries in their new commented sets at the time of their crash.
After the first system crashed, we attempted to ipset save / ipset restore
the commented set from one machine to their other. Several of the saved entries would not import into the target. Upon further investigation, we discovered a handful of the comments were corrupted (see original report for examples). It was the corruption of comments from ipset save
that led us to suspect our new use of the comment facility.
The kernel:
# uname -a
Linux router 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt9-3~deb8u1 (2015-04-24) x86_64 GNU/Linux
My apologies for failing to include a stacktrace with original report, I have done so for one of the two systems below. The strack trace for the second system was not found in syslog. I can share a screen capture of the stacktrace made from a cell phone at the time, but the quality makes it difficult to read.
Mar 24 21:24:13 x kernel: [118695742.067889] BUG: unable to handle kernel paging request at 0000416df0000251
Mar 24 21:24:13 x kernel: [118695742.068207] IP: [<ffffffff814173fa>] dev_get_stats+0xa/0x200
Mar 24 21:24:13 x kernel: [118695742.068417] PGD 0
Mar 24 21:24:13 x kernel: [118695742.068611] Oops: 0000 [#3] SMP
Mar 24 21:24:13 x kernel: [118695742.068893] Modules linked in: nfnetlink_queue bluetooth 6lowpan_iphc rfkill binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_comment xt_tcpudp ipt_REJECT xt_NFLOG nfnetlink_log xt_limit nf_conntrack_ipv4 nf_defrag_ipv4 xt_set xt_iprange xt_multiport xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ip_set_list_set ip_set_hash_net ip_set_hash_ip ip_set nfnetlink x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul ttm drm_kms_helper drm joydev hid_generic ipmi_devintf glue_helper usbhid hid pl2303 dcdbas evdev usbserial shpchp acpi_power_meter iTCO_wdt iTCO_vendor_support ablk_helper sb_edac cryptd edac_core ipmi_si tpm_tis tpm pcspkr ipmi_msghandler wmi acpi_pad lpc_ich mfd_core mei_me mei button processor thermal_sys loop autofs4 ext4 crc16 mbcache jbd2 sg sd_mod sr_mod ses crc_t10dif crct10dif_generic cdrom enclosure crct10dif_pclmul crct10dif_common crc32c_intel ehci_pci igb ahci ehci_hcd i2c_algo_bit libahci libata tg3 i2c_core megaraid_sas usbcore dca libphy ptp pps_core usb_common scsi_mod
Mar 24 21:24:13 x kernel: [118695742.079735] CPU: 1 PID: 6344 Comm: snmpd Tainted: G D 3.16.0-4-amd64 #1 Debian 3.16.7-ckt9-3~deb8u1
Mar 24 21:24:13 x kernel: [118695742.079888] Hardware name: Dell Inc. PowerEdge x20/0KM5PX, BIOS 2.3.3 07/10/2014
Mar 24 21:24:13 x kernel: [118695742.080035] task: ffff88012a3ee3d0 ti: ffff880077a58000 task.ti: ffff880077a58000
Mar 24 21:24:13 x kernel: [118695742.080181] RIP: 0010:[<ffffffff814173fa>] [<ffffffff814173fa>] dev_get_stats+0xa/0x200
Mar 24 21:24:13 x kernel: [118695742.080414] RSP: 0018:ffff880077a5bd50 EFLAGS: 00010282
Mar 24 21:24:13 x kernel: [118695742.080533] RAX: ffffffff81675a80 RBX: 0000416df0000069 RCX: 0000000000000000
Mar 24 21:24:13 x kernel: [118695742.080679] RDX: ffff880077a5bec8 RSI: ffff880077a5bdd0 RDI: 0000416df0000069
Mar 24 21:24:13 x kernel: [118695742.080823] RBP: 0000416df0000069 R08: 0000000000000000 R09: 00000000000000c8
Mar 24 21:24:13 x kernel: [118695742.080968] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000400
Mar 24 21:24:13 x kernel: [118695742.081110] R13: ffff880077a5bf58 R14: 0000416df0000069 R15: ffff8800c698f640
Mar 24 21:24:13 x kernel: [118695742.081257] FS: 00007f215ae59700(0000) GS:ffff88012f020000(0000) knlGS:0000000000000000
Mar 24 21:24:13 x kernel: [118695742.081404] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 24 21:24:13 x kernel: [118695742.081524] CR2: 0000416df0000251 Cx: 00000000c6c9f000 CR4: 00000000001407e0
Mar 24 21:24:13 x kernel: [118695742.081669] Stack:
Mar 24 21:24:13 x kernel: [118695742.081780] 0000416df0000069 ffff8800c698f640 ffffffff814369d1 ffff8800c28cf7c0
Mar 24 21:24:13 x kernel: [118695742.082258] 0000000000000000 0000000000000000 00007f215ae67000 00000007f215ae67
Mar 24 21:24:13 x kernel: [118695742.082736] 00007f2100018000 ffff8800b1bb3258 ffff8800c6baa918 ffff8800c6baa918
Mar 24 21:24:13 x kernel: [118695742.083213] Call Trace:
Mar 24 21:24:13 x kernel: [118695742.083331] [<ffffffff814369d1>] ? dev_seq_printf_stats+0x21/0xf0
Mar 24 21:24:13 x kernel: [118695742.083457] [<ffffffff8116e149>] ? vma_merge+0xf9/0x340
Mar 24 21:24:13 x kernel: [118695742.083581] [<ffffffff811c9e1a>] ? seq_puts+0x3a/0x60
Mar 24 21:24:13 x kernel: [118695742.083703] [<ffffffff81436ab0>] ? dev_seq_show+0x10/0x30
Mar 24 21:24:13 x kernel: [118695742.083825] [<ffffffff811c975b>] ? seq_read+0x20b/0x360
Mar 24 21:24:13 x kernel: [118695742.083949] [<ffffffff81206929>] ? proc_reg_read+0x39/0x70
Mar 24 21:24:13 x kernel: [118695742.084074] [<ffffffff811a7ef3>] ? vfs_read+0x93/0x170
Mar 24 21:24:13 x kernel: [118695742.084196] [<ffffffff811a8b22>] ? SyS_read+0x42/0xa0
Mar 24 21:24:13 x kernel: [118695742.084319] [<ffffffff81510e4d>] ? system_call_fast_compare_end+0x10/0x15
Mar 24 21:24:13 x kernel: [118695742.084463] Code: 00 8b 16 48 83 c7 04 48 83 c6 04 83 e8 04 89 57 fc e9 77 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 fd 53 <4c> 8b 87 e8 01 00 00 48 89 f3 49 83 78 78 00 0f 84 a1 00 00 00
Mar 24 21:24:13 x kernel: [118695742.090087] RIP [<ffffffff814173fa>] dev_get_stats+0xa/0x200
Mar 24 21:24:13 x kernel: [118695742.090378] RSP <ffff880077a5bd50>
Mar 24 21:24:13 x kernel: [118695742.090493] CR2: 0000416df0000251
Mar 24 21:24:13 x kernel: [118695742.090641] ---[ end trace 146c49fa2b4159ee ]---
As I said, my evidence that ipsets comments caused the crash may be circumstantial.
We've started using comments with ipsets and discovered what maybe a bug. Randomly, a comment will contained garbled bytes, or random strings. Here are several examples extracted from an
ipset list
.Note: although the last entry had a legible comment, it is not the comment value we set.
There are more than a thousand entries, most of which preserved their comments correctly. We typically enter a 4-charcter comment (to keep kernel memory consumption to a minimum).
The command used to create the set was:
create 24hour hash:ip family inet hashsize 1024 maxelem 65536 timeout 86400 comment
The system that displayed this behavior is Debian 8.0