Malshare / MalShare

http://www.malshare.com
22 stars 4 forks source link

Unplanned outage - 2019 September 13 #32

Closed silascutler closed 4 years ago

silascutler commented 4 years ago

System was unresponsive at identified at 11:00 PM (UTC). External monitoring says this may have started around 3:51 AM (UTC). Investigating issue

silascutler commented 4 years ago

System is hung at boot. Waiting on a reboot atm. Last messages I see are:

blk_update_request: I/O error, sdb, sector 2403408872
INFO: task jbd2/dm-0-8:863 blocked for more than 120 seconds.
          Tainted: G                 I          4.4.0-161-generic #189-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message"
INFO: task jbd2/dm-0-8:863 blocked for more than 120 seconds.
          Tainted: G                 I          4.4.0-161-generic #189-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message"
INFO: task jbd2/dm-0-8:863 blocked for more than 120 seconds.
          Tainted: G                 I          4.4.0-161-generic #189-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message"
silascutler commented 4 years ago

Testing:

  1. Requested second reboot, booting to previous kernel release `4.4.0-159-generic #189-Ubuntu".
  2. During bootup, the system starts Checking Disk /storage and after ~80% it says blk_update_request: I/O error, sdb, sector 2403408872 and bootup freezes.
  3. Rebooting again
silascutler commented 4 years ago

Hosting provider also noted an error message of E1810 Hard drive 5 fault

silascutler commented 4 years ago

1) Booted into RAID management tool. Saw disk #4 was failing in Disk Group 1 - Forced online. Group is now showing as online. 2) Booted and marked secondary drives as defaults,nofail in /etc/fstab.
3) Rebooting to see if that will boot the system properly

silascutler commented 4 years ago

Looks like some files may be missing. Still checking on disks. Host is live.

silascutler commented 4 years ago

Moved site to offline mode. Running fsck on storage drive.

silascutler commented 4 years ago

fsck finished. Doing testing / checking on drive before returning to live

silascutler commented 4 years ago

Running a final reboot. Final Checks

silascutler commented 4 years ago

Host is back online. A quick check shows one or two recently uploaded files were lost from the outage. Closing out.