My server experienced several panic during last a few months.
If cache-line dirty is full, it panics in several days; If not, it survives much longer.
In last panic, I printed all stacks of all CPU and found a hard lockup.
It seems the process-context and IRQ-context cause a dead LOCK?
Since the lock->waiters_lock is shared between 2 context, should we use spinlock_irqsave() or spinlock_bh() instead?
vmcore-dmesg.txt vmcore-dmesg.txt
Description
My server experienced several panic during last a few months. If cache-line dirty is full, it panics in several days; If not, it survives much longer. In last panic, I printed all stacks of all CPU and found a hard lockup.
It looks like all the IO thread are stuck in getting queue_lock. Then I found the suspicious lock holder.
It seems the process-context and IRQ-context cause a dead LOCK? Since the lock->waiters_lock is shared between 2 context, should we use spinlock_irqsave() or spinlock_bh() instead?
Expected Behavior
Actual Behavior
Steps to Reproduce
Possible Fix
I changed all the spinlock() to spinlock_irqsave() in ocf/src/utils/utils_async_lock.c. but not sure this is a real fix.
Logs
full dmesg of last panic is attached
Your Environment
╔══════════════════╤═══════════╤═══════╤═════════════╗ ║ Usage statistics │ Count │ % │ Units ║ ╠══════════════════╪═══════════╪═══════╪═════════════╣ ║ Occupancy │ 234104000 │ 100.0 │ 4KiB Blocks ║ ║ Free │ 32 │ 0.0 │ 4KiB Blocks ║ ║ Clean │ 1216 │ 0.0 │ 4KiB Blocks ║ ║ Dirty │ 234102784 │ 100.0 │ 4KiB Blocks ║ ╚══════════════════╧═══════════╧═══════╧═════════════╝
╔══════════════════════╤═══════════╤═══════╤══════════╗ ║ Request statistics │ Count │ % │ Units ║ ╠══════════════════════╪═══════════╪═══════╪══════════╣ ║ Read hits │ 433982 │ 0.3 │ Requests ║ ║ Read partial misses │ 889 │ 0.0 │ Requests ║ ║ Read full misses │ 54794 │ 0.0 │ Requests ║ ║ Read total │ 489665 │ 0.3 │ Requests ║ ╟──────────────────────┼───────────┼───────┼──────────╢ ║ Write hits │ 112766652 │ 73.3 │ Requests ║ ║ Write partial misses │ 5472 │ 0.0 │ Requests ║ ║ Write full misses │ 30930034 │ 20.1 │ Requests ║ ║ Write total │ 143702158 │ 93.3 │ Requests ║ ╟──────────────────────┼───────────┼───────┼──────────╢ ║ Pass-Through reads │ 143 │ 0.0 │ Requests ║ ║ Pass-Through writes │ 9751876 │ 6.3 │ Requests ║ ║ Serviced requests │ 144191823 │ 93.7 │ Requests ║ ╟──────────────────────┼───────────┼───────┼──────────╢ ║ Total requests │ 153943842 │ 100.0 │ Requests ║ ╚══════════════════════╧═══════════╧═══════╧══════════╝
╔══════════════════════════════════╤═══════════╤═══════╤═════════════╗ ║ Block statistics │ Count │ % │ Units ║ ╠══════════════════════════════════╪═══════════╪═══════╪═════════════╣ ║ Reads from core(s) │ 277173 │ 0.1 │ 4KiB Blocks ║ ║ Writes to core(s) │ 313603620 │ 99.9 │ 4KiB Blocks ║ ║ Total to/from core(s) │ 313880793 │ 100.0 │ 4KiB Blocks ║ ╟──────────────────────────────────┼───────────┼───────┼─────────────╢ ║ Reads from cache │ 198071619 │ 20.7 │ 4KiB Blocks ║ ║ Writes to cache │ 758133245 │ 79.3 │ 4KiB Blocks ║ ║ Total to/from cache │ 956204864 │ 100.0 │ 4KiB Blocks ║ ╟──────────────────────────────────┼───────────┼───────┼─────────────╢ ║ Reads from exported object(s) │ 3031928 │ 0.3 │ 4KiB Blocks ║ ║ Writes to exported object(s) │ 876722477 │ 99.7 │ 4KiB Blocks ║ ║ Total to/from exported object(s) │ 879754405 │ 100.0 │ 4KiB Blocks ║ ╚══════════════════════════════════╧═══════════╧═══════╧═════════════╝
╔════════════════════╤═══════╤═════╤══════════╗ ║ Error statistics │ Count │ % │ Units ║ ╠════════════════════╪═══════╪═════╪══════════╣ ║ Cache read errors │ 0 │ 0.0 │ Requests ║ ║ Cache write errors │ 0 │ 0.0 │ Requests ║ ║ Cache total errors │ 0 │ 0.0 │ Requests ║ ╟────────────────────┼───────┼─────┼──────────╢ ║ Core read errors │ 0 │ 0.0 │ Requests ║ ║ Core write errors │ 0 │ 0.0 │ Requests ║ ║ Core total errors │ 0 │ 0.0 │ Requests ║ ╟────────────────────┼───────┼─────┼──────────╢ ║ Total errors │ 0 │ 0.0 │ Requests ║ ╚════════════════════╧═══════╧═════╧══════════╝ SIT [root@sz01bla1269test ~]# casadm -P -i 25 Cache Id 25 Cache Size 56189280 [4KiB Blocks] / 214.35 [GiB] Cache Device /dev/sdf15 Exported Object - Core Devices 1 Inactive Core Devices 0 Write Policy wb Cleaning Policy acp Promotion Policy always Cache line size 64 [KiB] Metadata Memory Footprint 276.6 [MiB] Dirty for 88775 [s] / 1 [d] 39 [m] 35 [s] Status Running
╔══════════════════╤══════════╤═══════╤═════════════╗ ║ Usage statistics │ Count │ % │ Units ║ ╠══════════════════╪══════════╪═══════╪═════════════╣ ║ Occupancy │ 56189264 │ 100.0 │ 4KiB Blocks ║ ║ Free │ 16 │ 0.0 │ 4KiB Blocks ║ ║ Clean │ 5447840 │ 9.7 │ 4KiB Blocks ║ ║ Dirty │ 50741424 │ 90.3 │ 4KiB Blocks ║ ╚══════════════════╧══════════╧═══════╧═════════════╝
╔══════════════════════╤══════════╤═══════╤══════════╗ ║ Request statistics │ Count │ % │ Units ║ ╠══════════════════════╪══════════╪═══════╪══════════╣ ║ Read hits │ 163537 │ 0.3 │ Requests ║ ║ Read partial misses │ 4 │ 0.0 │ Requests ║ ║ Read full misses │ 1340 │ 0.0 │ Requests ║ ║ Read total │ 164881 │ 0.4 │ Requests ║ ╟──────────────────────┼──────────┼───────┼──────────╢ ║ Write hits │ 35817225 │ 76.3 │ Requests ║ ║ Write partial misses │ 666 │ 0.0 │ Requests ║ ║ Write full misses │ 10611184 │ 22.6 │ Requests ║ ║ Write total │ 46429075 │ 98.9 │ Requests ║ ╟──────────────────────┼──────────┼───────┼──────────╢ ║ Pass-Through reads │ 0 │ 0.0 │ Requests ║ ║ Pass-Through writes │ 329045 │ 0.7 │ Requests ║ ║ Serviced requests │ 46593956 │ 99.3 │ Requests ║ ╟──────────────────────┼──────────┼───────┼──────────╢ ║ Total requests │ 46923001 │ 100.0 │ Requests ║ ╚══════════════════════╧══════════╧═══════╧══════════╝
╔══════════════════════════════════╤═══════════╤═══════╤═════════════╗ ║ Block statistics │ Count │ % │ Units ║ ╠══════════════════════════════════╪═══════════╪═══════╪═════════════╣ ║ Reads from core(s) │ 10385 │ 0.0 │ 4KiB Blocks ║ ║ Writes to core(s) │ 101331247 │ 100.0 │ 4KiB Blocks ║ ║ Total to/from core(s) │ 101341632 │ 100.0 │ 4KiB Blocks ║ ╟──────────────────────────────────┼───────────┼───────┼─────────────╢ ║ Reads from cache │ 100243009 │ 28.0 │ 4KiB Blocks ║ ║ Writes to cache │ 257151669 │ 72.0 │ 4KiB Blocks ║ ║ Total to/from cache │ 357394678 │ 100.0 │ 4KiB Blocks ║ ╟──────────────────────────────────┼───────────┼───────┼─────────────╢ ║ Reads from exported object(s) │ 988786 │ 0.4 │ 4KiB Blocks ║ ║ Writes to exported object(s) │ 259636373 │ 99.6 │ 4KiB Blocks ║ ║ Total to/from exported object(s) │ 260625159 │ 100.0 │ 4KiB Blocks ║ ╚══════════════════════════════════╧═══════════╧═══════╧═════════════╝
╔════════════════════╤═══════╤═════╤══════════╗ ║ Error statistics │ Count │ % │ Units ║ ╠════════════════════╪═══════╪═════╪══════════╣ ║ Cache read errors │ 0 │ 0.0 │ Requests ║ ║ Cache write errors │ 0 │ 0.0 │ Requests ║ ║ Cache total errors │ 0 │ 0.0 │ Requests ║ ╟────────────────────┼───────┼─────┼──────────╢ ║ Core read errors │ 0 │ 0.0 │ Requests ║ ║ Core write errors │ 0 │ 0.0 │ Requests ║ ║ Core total errors │ 0 │ 0.0 │ Requests ║ ╟────────────────────┼───────┼─────┼──────────╢ ║ Total errors │ 0 │ 0.0 │ Requests ║ ╚════════════════════╧═══════╧═════╧══════════╝