Waziup / WaziCloud

WAZIUP Cloud and local platform
31 stars 28 forks source link

Mongod is in crashLoopBackOff #163

Closed cdupont closed 7 years ago

cdupont commented 7 years ago

Mongod keeps restarting. It appears that the volume /mnt/vol mounted in the master VM from SIRIS went in read only:

[203520.512206] INFO: task jbd2/vdb-8:618 blocked for more than 120 seconds.
[203520.513329]       Not tainted 4.4.0-79-generic #100-Ubuntu
[203520.514302] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[203520.515506] jbd2/vdb-8      D ffff8802357afad8     0   618      2 0x00000000
[203520.515516]  ffff8802357afad8 00000006343558c0 ffff880236238cc0 ffff880235700cc0
[203520.515528]  ffff8802357b0000 ffff88023fd16dc0 7fffffffffffffff ffffffff8183d150
[203520.515530]  ffff8802357afc30 ffff8802357afaf0 ffffffff8183c955 0000000000000000
[203520.515532] Call Trace:
[203520.515563]  [<ffffffff8183d150>] ? bit_wait+0x60/0x60
[203520.515565]  [<ffffffff8183c955>] schedule+0x35/0x80
[203520.515567]  [<ffffffff8183faa5>] schedule_timeout+0x1b5/0x270
[203520.515581]  [<ffffffff8106428e>] ? kvm_clock_get_cycles+0x1e/0x20
[203520.515583]  [<ffffffff8106428e>] ? kvm_clock_get_cycles+0x1e/0x20
[203520.515590]  [<ffffffff810f625c>] ? ktime_get+0x3c/0xb0
[203520.515592]  [<ffffffff8183d150>] ? bit_wait+0x60/0x60
[203520.515594]  [<ffffffff8183be84>] io_schedule_timeout+0xa4/0x110
[203520.515596]  [<ffffffff8183d16b>] bit_wait_io+0x1b/0x70
[203520.515598]  [<ffffffff8183ccfd>] __wait_on_bit+0x5d/0x90
[203520.515599]  [<ffffffff8183d150>] ? bit_wait+0x60/0x60
[203520.515601]  [<ffffffff8183cdb2>] out_of_line_wait_on_bit+0x82/0xb0
[203520.515610]  [<ffffffff810c4370>] ? autoremove_wake_function+0x40/0x40
[203520.515621]  [<ffffffff81246c52>] __wait_on_buffer+0x32/0x40
[203520.515627]  [<ffffffff812ef728>] jbd2_journal_commit_transaction+0xf48/0x1870
[203520.515634]  [<ffffffff810ed10e>] ? try_to_del_timer_sync+0x5e/0x90
[203520.515640]  [<ffffffff812f3d4a>] kjournald2+0xca/0x250
[203520.515643]  [<ffffffff810c4330>] ? wake_atomic_t_function+0x60/0x60
[203520.515645]  [<ffffffff812f3c80>] ? commit_timeout+0x10/0x10
[203520.515672]  [<ffffffff810a0c25>] kthread+0xe5/0x100
[203520.515676]  [<ffffffff810a0b40>] ? kthread_create_on_node+0x1e0/0x1e0
[203520.515680]  [<ffffffff81840e0f>] ret_from_fork+0x3f/0x70
[203520.515682]  [<ffffffff810a0b40>] ? kthread_create_on_node+0x1e0/0x1e0
[203520.516963] INFO: task mongod:9574 blocked for more than 120 seconds.
[203520.517932]       Not tainted 4.4.0-79-generic #100-Ubuntu
[203520.518772] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[203520.520200] mongod          D ffff8801e7d03bb8     0  9574   9052 0x00000000
[203520.520210]  ffff8801e7d03bb8 ffff8801e7d03ba8 ffff880236238cc0 ffff8801f2ec8000
[203520.520212]  ffff8801e7d04000 ffff88023fd16dc0 7fffffffffffffff ffffffff8183d150
[203520.520214]  ffff8801e7d03d18 ffff8801e7d03bd0 ffffffff8183c955 0000000000000000
[203520.520216] Call Trace:
[203520.520219]  [<ffffffff8183d150>] ? bit_wait+0x60/0x60
[203520.520221]  [<ffffffff8183c955>] schedule+0x35/0x80
[203520.520223]  [<ffffffff8183faa5>] schedule_timeout+0x1b5/0x270
[203520.520234]  [<ffffffff813cae86>] ? blk_flush_plug_list+0xd6/0x240
[203520.520237]  [<ffffffff8106428e>] ? kvm_clock_get_cycles+0x1e/0x20
[203520.520238]  [<ffffffff8183d150>] ? bit_wait+0x60/0x60
[203520.520240]  [<ffffffff8183be84>] io_schedule_timeout+0xa4/0x110
[203520.520242]  [<ffffffff8183d16b>] bit_wait_io+0x1b/0x70
[203520.520243]  [<ffffffff8183ccfd>] __wait_on_bit+0x5d/0x90
[203520.520254]  [<ffffffff8118e5cb>] wait_on_page_bit+0xcb/0xf0
[203520.520258]  [<ffffffff810c4370>] ? autoremove_wake_function+0x40/0x40
[203520.520260]  [<ffffffff8118e6e3>] __filemap_fdatawait_range+0xf3/0x160
[203520.520262]  [<ffffffff81190581>] ? __filemap_fdatawrite_range+0xd1/0x100
[203520.520267]  [<ffffffff8118e764>] filemap_fdatawait_range+0x14/0x30
[203520.520269]  [<ffffffff811906cf>] filemap_write_and_wait_range+0x3f/0x70
[203520.520276]  [<ffffffff81296461>] ext4_sync_file+0x101/0x350
[203520.520285]  [<ffffffff812437cb>] vfs_fsync_range+0x4b/0xb0
[203520.520287]  [<ffffffff8124388d>] do_fsync+0x3d/0x70
[203520.520291]  [<ffffffff81243b43>] SyS_fdatasync+0x13/0x20
[203520.520293]  [<ffffffff81840a72>] entry_SYSCALL_64_fastpath+0x16/0x71
[203520.520298] INFO: task mongod:9670 blocked for more than 120 seconds.
[203520.521269]       Not tainted 4.4.0-79-generic #100-Ubuntu
[203520.522113] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[203520.523259] mongod          D ffff8801e7e138e8     0  9670   9052 0x00000000
[203520.523270]  ffff8801e7e138e8 ffff88023423ac00 ffff880236238000 ffff880090de72c0
[203520.523271]  ffff8801e7e14000 ffff88023fc96dc0 7fffffffffffffff ffffffff8183d150
[203520.523281]  ffff8801e7e13a48 ffff8801e7e13900 ffffffff8183c955 0000000000000000
[203520.523283] Call Trace:
[203520.523287]  [<ffffffff8183d150>] ? bit_wait+0x60/0x60
[203520.523288]  [<ffffffff8183c955>] schedule+0x35/0x80
[203520.523290]  [<ffffffff8183faa5>] schedule_timeout+0x1b5/0x270
[203520.523293]  [<ffffffff810bd195>] ? update_sd_lb_stats+0x115/0x530
[203520.523295]  [<ffffffff8106428e>] ? kvm_clock_get_cycles+0x1e/0x20
[203520.523297]  [<ffffffff8183d150>] ? bit_wait+0x60/0x60
[203520.523298]  [<ffffffff8183be84>] io_schedule_timeout+0xa4/0x110
[203520.523300]  [<ffffffff8183d16b>] bit_wait_io+0x1b/0x70
[203520.523302]  [<ffffffff8183ccfd>] __wait_on_bit+0x5d/0x90
[203520.523303]  [<ffffffff8183d150>] ? bit_wait+0x60/0x60
[203520.523305]  [<ffffffff8183cdb2>] out_of_line_wait_on_bit+0x82/0xb0
[203520.523307]  [<ffffffff810c4370>] ? autoremove_wake_function+0x40/0x40
[203520.523309]  [<ffffffff812ed205>] do_get_write_access+0x245/0x490
[203520.523312]  [<ffffffff812ed4a1>] jbd2_journal_get_write_access+0x51/0x70
[203520.523318]  [<ffffffff812d04ab>] __ext4_journal_get_write_access+0x3b/0x80
[203520.523320]  [<ffffffff81297ff1>] __ext4_new_inode+0x531/0x13a0
[203520.523325]  [<ffffffff812aa8e9>] ext4_create+0x119/0x1b0
[203520.523331]  [<ffffffff8121b937>] vfs_create+0x127/0x190
[203520.523334]  [<ffffffff8121eb2c>] path_openat+0x120c/0x1330
[203520.523336]  [<ffffffff8121fe41>] do_filp_open+0x91/0x100
[203520.523341]  [<ffffffff812268f3>] ? dput+0x153/0x220
[203520.523343]  [<ffffffff8122d786>] ? __alloc_fd+0x46/0x190
[203520.523348]  [<ffffffff8120e2f8>] do_sys_open+0x138/0x2a0
[203520.523351]  [<ffffffff810f6579>] ? do_gettimeofday+0x29/0x90
[203520.523353]  [<ffffffff8120e47e>] SyS_open+0x1e/0x20
[203520.523355]  [<ffffffff81840a72>] entry_SYSCALL_64_fastpath+0x16/0x71
[203530.969871] blk_update_request: I/O error, dev vdb, sector 21311856
[203530.971329] Aborting journal on device vdb-8.
[203530.972092] EXT4-fs error (device vdb) in __ext4_new_inode:932: Journal has aborted
[203530.972151] EXT4-fs error (device vdb) in __ext4_new_inode:932: Journal has aborted
[203530.972197] EXT4-fs error (device vdb) in ext4_reserve_inode_write:5144: Journal has aborted
[203535.100989] blk_update_request: I/O error, dev vdb, sector 278968
[203535.102396] EXT4-fs warning (device vdb): ext4_end_bio:329: I/O error -5 writing to inode 32 (offset 0 size 0 starting block 34872)
[203535.102400] Buffer I/O error on device vdb, logical block 34871
[203535.103529] blk_update_request: I/O error, dev vdb, sector 295200
[203535.104679] EXT4-fs warning (device vdb): ext4_end_bio:329: I/O error -5 writing to inode 32 (offset 0 size 0 starting block 36901)
[203535.104682] Buffer I/O error on device vdb, logical block 36900
[203554.089909] EXT4-fs error (device vdb): ext4_journal_check_start:56: Detected aborted journal
[203554.095955] EXT4-fs (vdb): Remounting filesystem read-only
[203554.102506] EXT4-fs error (device vdb): ext4_journal_check_start:56: Detected aborted journal
[203554.119205] EXT4-fs error (device vdb): ext4_journal_check_start:56: Detected aborted journal
[203554.122343] EXT4-fs error (device vdb) in ext4_reserve_inode_write:5144: Journal has aborted
[203554.132027] EXT4-fs error (device vdb) in __ext4_new_inode:1121: Journal has aborted
[203554.134754] EXT4-fs error (device vdb): ext4_journal_check_start:56: Detected aborted journal
[203554.137333] EXT4-fs error (device vdb) in ext4_evict_inode:246: Journal has aborted
cdupont commented 7 years ago

We 'fixed it' by unmounting and re-mounting the volume.