Closed igorkonopko closed 4 years ago
It looks like it is also reproducible with RAID1 2xP4800X 375GB as cache and RAID5 of 8 drives as core. Stack is slightly different, but it looks like it might be the same issue
[ 2821.903501] kernel BUG at /home/ikonopko/open-cas-linux/modules/cas_cache/src/ocf/utils/utils_cache_line.c:29!
[ 2821.903656] invalid opcode: 0000 [#1] SMP
[ 2821.903725] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 iTCO_wdt iTCO_vendor_support vfat fat skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr cas_cache(OE) ipmi_ssif i2c_i801 cas_disk(OE) joydev lpc_ich
[ 2821.905010] sg mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic i2c_algo_bit drm_kms_helper crct10dif_pclmul crct10dif_common crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ixgbe nvme drm nvme_core megaraid_sas(OE) mdio ptp ahci pps_core libahci dca drm_panel_orientation_quirks libata nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[ 2821.905714] CPU: 4 PID: 3638 Comm: cas_io_cache1_4 Kdump: loaded Tainted: G OE ------------ 3.10.0-1127.el7.x86_64 #1
[ 2821.905894] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
[ 2821.906057] task: ffff90ad1af3c1c0 ti: ffff90aca0184000 task.ti: ffff90aca0184000
[ 2821.906172] RIP: 0010:[<ffffffffc0d92255>] [<ffffffffc0d92255>] __set_cache_line_invalid+0x155/0x160 [cas_cache]
[ 2821.906343] RSP: 0018:ffff90aca0187ce8 EFLAGS: 00010212
[ 2821.906425] RAX: 00000000000000ff RBX: ffffb99c30001000 RCX: 00000000047f91c1
[ 2821.906536] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffb99c30001000
[ 2821.906647] RBP: ffff90aca0187d20 R08: 0000000000001000 R09: 00000000000000ff
[ 2821.906757] R10: 0000000000000007 R11: 00000000008ad577 R12: 00000000047f91c1
[ 2821.906867] R13: 0000000000001000 R14: 0000000000000007 R15: ffffb99c30001000
[ 2821.906978] FS: 0000000000000000(0000) GS:ffff90ad1ed00000(0000) knlGS:0000000000000000
[ 2821.907104] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2821.907192] CR2: 00007ef9d1436000 CR3: 0000002e1ea5c000 CR4: 00000000007607e0
[ 2821.907302] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2821.907412] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2821.907523] PKRU: 00000000
[ 2821.907565] Call Trace:
[ 2821.907614] [<ffffffffc0d7acb4>] ? ocf_metadata_hash_get_core_and_part_id+0x64/0xd0 [cas_cache]
[ 2821.907756] [<ffffffffc0d923b3>] set_cache_line_invalid_no_flush+0x63/0x90 [cas_cache]
[ 2821.907885] [<ffffffffc0d71e0f>] ocf_engine_map+0x2df/0x340 [cas_cache]
[ 2821.907993] [<ffffffffc0d72358>] ocf_engine_prepare_clines+0x168/0x170 [cas_cache]
[ 2821.908115] [<ffffffffc0d74b89>] ocf_write_wb+0x49/0xf0 [cas_cache]
[ 2821.908218] [<ffffffffc0d8d07c>] ocf_io_handle+0x3c/0x50 [cas_cache]
[ 2821.908321] [<ffffffffc0d8d0c5>] ocf_queue_run_single+0x35/0x40 [cas_cache]
[ 2821.908432] [<ffffffffc0d8d0f8>] ocf_queue_run+0x28/0x50 [cas_cache]
[ 2821.908534] [<ffffffffc0d6676a>] _cas_io_queue_thread+0xfa/0x150 [cas_cache]
[ 2821.908647] [<ffffffff8e4c7780>] ? wake_up_atomic_t+0x30/0x30
[ 2821.908739] [<ffffffffc0d66670>] ? cas_service_ioctl_ctrl+0x2800/0x2800 [cas_cache]
[ 2821.908857] [<ffffffff8e4c6691>] kthread+0xd1/0xe0
[ 2821.908933] [<ffffffff8e4c65c0>] ? insert_kthread_work+0x40/0x40
[ 2821.913972] [<ffffffff8eb92d37>] ret_from_fork_nospec_begin+0x21/0x21
[ 2821.919018] [<ffffffff8e4c65c0>] ? insert_kthread_work+0x40/0x40
[ 2821.924067] Code: 01 00 e8 df 41 a0 cd 4c 89 ef c6 07 00 0f 1f 40 00 44 89 e6 48 89 df e8 3a c9 fe ff 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 0f 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
[ 2821.934685] RIP [<ffffffffc0d92255>] __set_cache_line_invalid+0x155/0x160 [cas_cache]
[ 2821.940936] RSP <ffff90aca0187ce8>
It looks like my 2nd stack is the same as #454
Configuration:
Issue occurs during step 3 after 10-15 minutes - I already reproduced it few times. It looks that issue is somehow configuration dependent - since I was not able to reproduce it with P4800X 375GB - or the reproduction rate was lower.
Step 1 (preconditioning): fio --direct=1 --thread --ioengine=libaio --rw=randwrite --iodepth=128 --bs=16k --name=fio --io_size=100024713216 --filename=/dev/cas1-1
Step 2 (mysql preparation): mkfs.xfs -K -f /dev/cas1-1 mount /dev/cas1-1 /var/lib/mysql/ chown mysql:mysql /var/lib/mysql/ mysqld --initialize-insecure --user=mysql systemctl start mysqld.service
Step 3 (database creation): echo "create database sbt;" | mysql sysbench oltp_read_write --mysql-socket=/var/lib/mysql/mysql.sock --mysql-user=root --mysql-db=sbt --threads=32 --tables=32 --table_size=100000000 prepare
BUG details:
CAS version: