Closed avocadochicken closed 2 years ago
erigon version 2021.10.3-alpha 5.10.79-1-MANJARO #1 SMP PREEMPT Fri Nov 12 20:26:09 UTC 2021 x86_64 GNU/Linux
https://gist.githubusercontent.com/avocadochicken/54f0914b11c5a1b037da510ca7956360/raw/fdd0cc941beb8c43ded7cd00a6b19f298bd4b5c8/gistfile1.txt
It killed the entire drive. I dont know if it was Erigon or the HW / drive itself...
[261077.109886] nvme nvme0: I/O 259 QID 27 timeout, aborting [261078.629894] nvme nvme0: I/O 872 QID 21 timeout, aborting [261107.829035] nvme nvme0: I/O 259 QID 27 timeout, reset controller [261138.548445] nvme nvme0: I/O 24 QID 0 timeout, reset controller [261199.769796] nvme nvme0: Device not ready; aborting reset, CSTS=0x1 [261199.783697] nvme nvme0: Abort status: 0x371 [261199.783698] nvme nvme0: Abort status: 0x371 [261230.328950] nvme nvme0: Device not ready; aborting reset, CSTS=0x1 [261230.328953] nvme nvme0: Removing after probe failure status: -19 [261235.825742] INFO: task kcompactd0:250 blocked for more than 122 seconds. [261235.825744] Tainted: P W OE 5.10.79-1-MANJARO #1 [261235.825745] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [261235.825746] task:kcompactd0 state:D stack: 0 pid: 250 ppid: 2 flags:0x00004000 [261235.825749] Call Trace: [261235.825755] __schedule+0x288/0x800 [261235.825758] ? out_of_line_wait_on_bit_lock+0xb0/0xb0 [261235.825760] schedule+0x5b/0xc0 [261235.825761] io_schedule+0x42/0x70 [261235.825763] bit_wait_io+0xd/0x50 [261235.825764] __wait_on_bit_lock+0x5d/0xa0 [261235.825766] out_of_line_wait_on_bit_lock+0x92/0xb0 [261235.825769] ? var_wake_function+0x20/0x20 [261235.825772] __buffer_migrate_page.part.0+0xab/0x2b0 [261235.825774] move_to_new_page+0xa1/0x2f0 [261235.825776] ? page_counter_uncharge+0x36/0x50 [261235.825777] ? uncharge_batch+0xcf/0x140 [261235.825779] ? free_unref_page_commit+0x98/0x120 [261235.825781] migrate_pages+0x9c1/0xe50 [261235.825784] ? isolate_freepages_block+0x410/0x410 [261235.825785] ? split_map_pages+0x170/0x170 [261235.825786] ? migrate_page_states+0x290/0x290 [261235.825788] compact_zone+0x606/0xdb0 [261235.825791] ? finish_task_switch+0x75/0x250 [261235.825793] proactive_compact_node+0x8f/0xe0 [261235.825796] kcompactd+0x317/0x390 [261235.825798] ? add_wait_queue_exclusive+0x70/0x70 [261235.825799] ? kcompactd_do_work+0x240/0x240 [261235.825802] kthread+0x133/0x150 [261235.825803] ? kthread_associate_blkcg+0xc0/0xc0 [261235.825806] ret_from_fork+0x22/0x30 [261235.825841] INFO: task jbd2/nvme0n1p1-:13937 blocked for more than 122 seconds. [261235.825842] Tainted: P W OE 5.10.79-1-MANJARO #1 [261235.825843] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [261235.825843] task:jbd2/nvme0n1p1- state:D stack: 0 pid:13937 ppid: 2 flags:0x00004080 [261235.825845] Call Trace: [261235.825847] __schedule+0x288/0x800 [261235.825849] ? out_of_line_wait_on_bit_lock+0xb0/0xb0 [261235.825850] schedule+0x5b/0xc0 [261235.825851] io_schedule+0x42/0x70 [261235.825852] bit_wait_io+0xd/0x50 [261235.825853] __wait_on_bit+0x2a/0x90 [261235.825855] out_of_line_wait_on_bit+0x92/0xb0 [261235.825856] ? var_wake_function+0x20/0x20 [261235.825861] jbd2_journal_commit_transaction+0x1304/0x1d00 [jbd2] [261235.825864] ? sugov_get_util+0x60/0x60 [261235.825870] kjournald2+0xaf/0x280 [jbd2] [261235.825872] ? add_wait_queue_exclusive+0x70/0x70 [261235.825876] ? jbd2_journal_release_jbd_inode+0x150/0x150 [jbd2] [261235.825877] kthread+0x133/0x150 [261235.825879] ? kthread_associate_blkcg+0xc0/0xc0 [261235.825880] ret_from_fork+0x22/0x30 [261235.825892] INFO: task erigon:15253 blocked for more than 122 seconds. [261235.825892] Tainted: P W OE 5.10.79-1-MANJARO #1 [261235.825893] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [261235.825894] task:erigon state:D stack: 0 pid:15253 ppid: 14757 flags:0x00000080 [261235.825895] Call Trace: [261235.825897] __schedule+0x288/0x800 [261235.825898] ? out_of_line_wait_on_bit_lock+0xb0/0xb0 [261235.825899] schedule+0x5b/0xc0 [261235.825900] io_schedule+0x42/0x70 [261235.825901] bit_wait_io+0xd/0x50 [261235.825903] __wait_on_bit+0x2a/0x90 [261235.825904] out_of_line_wait_on_bit+0x92/0xb0 [261235.825905] ? var_wake_function+0x20/0x20 [261235.825909] do_get_write_access+0x274/0x3e0 [jbd2] [261235.825913] jbd2_journal_get_write_access+0x4f/0x80 [jbd2] [261235.825923] __ext4_journal_get_write_access+0x72/0x120 [ext4] [261235.825936] ext4_reserve_inode_write+0x7f/0xb0 [ext4] [261235.825947] __ext4_mark_inode_dirty+0x52/0x220 [ext4] [261235.825956] ? __ext4_journal_start_sb+0x9f/0x110 [ext4] [261235.825966] ext4_dirty_inode+0x5f/0x80 [ext4] [261235.825968] __mark_inode_dirty+0x1b5/0x360 [261235.825970] generic_update_time+0x71/0xd0 [261235.825972] file_update_time+0x123/0x140 [261235.825981] ext4_page_mkwrite+0x93/0x670 [ext4] [261235.825983] ? futex_wake+0x14d/0x180 [261235.825985] do_page_mkwrite+0x51/0xd0 [261235.825987] do_wp_page+0x240/0x2f0 [261235.825989] handle_mm_fault+0x120c/0x1a50 [261235.825992] do_user_addr_fault+0x1e6/0x420 [261235.825995] exc_page_fault+0x64/0x160 [261235.825997] ? asm_exc_page_fault+0x8/0x30 [261235.825999] asm_exc_page_fault+0x1e/0x30 [261235.826001] RIP: 0033:0x7fe5871a8dee [261235.826002] RSP: 002b:00007fe55ee20bc0 EFLAGS: 00010246 [261235.826003] RAX: 0000000000000000 RBX: 0000000000003b95 RCX: 0000000000020000 [261235.826004] RDX: 00000000000000b2 RSI: 00007fe55ee21640 RDI: 00007fe5084ed100 [261235.826005] RBP: 00007fe5084ed100 R08: 0000000000000010 R09: 0000000000000002 [261235.826006] R10: 00007fe555821ba0 R11: 0000000000000000 R12: 00007fe5084ed100 [261235.826006] R13: 0000000000000000 R14: 00007fe5084ed110 R15: 00007fe55ee21640
@avocadochicken Eeigon’s db doesn’t read/write to disk, it only using mmap/msync syscals, and Linux OS does read/write. So, likely you have hardware failure or miss-configuration.
System information
erigon version 2021.10.3-alpha 5.10.79-1-MANJARO #1 SMP PREEMPT Fri Nov 12 20:26:09 UTC 2021 x86_64 GNU/Linux
Backtrace
https://gist.githubusercontent.com/avocadochicken/54f0914b11c5a1b037da510ca7956360/raw/fdd0cc941beb8c43ded7cd00a6b19f298bd4b5c8/gistfile1.txt
Kernel
It killed the entire drive. I dont know if it was Erigon or the HW / drive itself...