Closed abitrolly closed 6 years ago
Need to run fsck
when server is free. memtest
from month ago didn't reveal any problems.
Mars was down again. /var/log/syslog.1
:
Dec 3 11:51:16 mars systemd-timesyncd[914]: Timed out waiting for reply from 91.189.89.199:123 (ntp.ubuntu.com).
Dec 3 11:51:26 mars systemd-timesyncd[914]: Timed out waiting for reply from 91.189.94.4:123 (ntp.ubuntu.com).
Dec 3 11:53:39 mars NetworkManager[1153]: <info> [1512291219.3707] connectivity: (enp35s0) response shorter than expected 'NetworkManager is online'; assuming captive portal.
Dec 3 11:58:39 mars NetworkManager[1153]: <info> [1512291519.2776] connectivity: (enp35s0) response shorter than expected 'NetworkManager is online'; assuming captive portal.
Dec 3 12:02:19 mars systemd[1]: Started Run anacron jobs.
Dec 3 12:02:19 mars anacron[29104]: Anacron 2.3 started on 2017-12-03
Dec 3 12:02:19 mars anacron[29104]: Normal exit (0 jobs run)
Dec 3 12:03:39 mars NetworkManager[1153]: <info> [1512291819.3449] connectivity: (enp35s0) response shorter than expected 'NetworkManager is online'; assuming captive portal.
Dec 3 12:08:39 mars NetworkManager[1153]: <info> [1512292119.3716] connectivity: (enp35s0) response shorter than expected 'NetworkManager is online'; assuming captive portal.
Dec 3 12:09:57 mars colord[1199]: failed to get session [pid 27963]: No data available
Dec 3 12:13:39 mars NetworkManager[1153]: <info> [1512292419.3713] connectivity: (enp35s0) response shorter than expected 'NetworkManager is online'; assuming captive portal.
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Dec 4 11:18:47 mars rsyslogd: [origin software="rsyslogd" swVersion="8.16.0" x-pid="1092" x-info="http://www.rsyslog.com"] start
Dec 4 11:18:47 mars systemd-modules-load[389]: Inserted module 'lp'
Dec 4 11:18:47 mars systemd-modules-load[389]: Inserted module 'ppdev'
Dec 4 11:18:47 mars systemd-modules-load[389]: Inserted module 'parport_pc'
Dec 4 11:18:47 mars keyboard-setup.sh[388]: cannot open file /tmp/tmpkbd.vaj1SS
Dec 4 11:18:47 mars systemd[1]: Started udev Kernel Device Manager.
Dec 4 11:18:47 mars systemd[1]: Starting Remount Root and Kernel File Systems...
Dec 4 11:18:47 mars systemd[1]: Started Remount Root and Kernel File Systems.
@litvintech @hleb-albau if that repeats, plz. add more data to this issue.
We need a solution for server monitoring.
https://github.com/etsy/statsd seems to be what Wargaming web team was talking about on the last Minsk Python Meetup.
Dec 19 17:52:27 mars kernel: [ 5210.729762] general protection fault: 0000 [#2] SMP
Dec 19 17:52:27 mars kernel: [ 5210.731261] Modules linked in: btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs veth cfg80211 xt_nat xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_i
pv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack libcrc32c br_netfilte
r bridge stp llc overlay edac_mce_amd kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek ghash_clmulni_intel pcbc snd_hda_codec_generic aesni_intel snd_hda_codec_hdmi aes
_x86_64 crypto_simd glue_helper cryptd snd_hda_intel snd_seq_midi snd_seq_midi_event snd_hda_codec snd_rawmidi snd_hda_core eeepc_wmi asus_wmi snd_hwdep sparse_keymap serio_raw video snd_seq wm
i_bmof snd_pcm snd_seq_device i2c_piix4 snd_timer ccp snd soundcore
Dec 19 17:52:27 mars kernel: [ 5210.737824] joydev input_leds shpchp 8250_dw mac_hid parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid amdkfd amd_iommu_v2 amdgpu mx
m_wmi ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm igb psmouse dca i2c_algo_bit ptp ahci nvme pps_core libahci nvme_core gpio_amdpt wmi gpio_generic
Dec 19 17:52:27 mars kernel: [ 5210.741425] CPU: 8 PID: 25302 Comm: bitcoin-httpwor Tainted: G D 4.13.0-19-generic #22-Ubuntu
Dec 19 17:52:27 mars kernel: [ 5210.743185] Hardware name: System manufacturer System Product Name/CROSSHAIR VI HERO, BIOS 1701 09/22/2017
Dec 19 17:52:27 mars kernel: [ 5210.744953] task: ffff94f40b2645c0 task.stack: ffffa4f68e9dc000
Dec 19 17:52:27 mars kernel: [ 5210.746776] RIP: 0010:list_lru_del+0x94/0x140
Dec 19 17:52:27 mars kernel: [ 5210.748399] RSP: 0000:ffffa4f68e9df980 EFLAGS: 00010006
Dec 19 17:52:27 mars kernel: [ 5210.750074] RAX: ffff94fc8b1d07e0 RBX: ffff94fc8c8abe80 RCX: 9c2a8969439858af
Dec 19 17:52:27 mars kernel: [ 5210.751729] RDX: d64808357a5a976e RSI: fffff2e77bab9d9f RDI: ffff94fc8c8abe80
Dec 19 17:52:27 mars kernel: [ 5210.753438] RBP: ffffa4f68e9df9a0 R08: 00000000ffffffff R09: 0000000000000000
Dec 19 17:52:27 mars kernel: [ 5210.755095] R10: ffff94fbaae774a0 R11: 0000000000000001 R12: ffff94fc2ae77490
Dec 19 17:52:27 mars kernel: [ 5210.756749] R13: 0000000000000000 R14: ffff94fbaae77490 R15: 0000000000000000
Dec 19 17:52:27 mars kernel: [ 5210.758417] FS: 00007fbc1b7fe700(0000) GS:ffff94fc9e800000(0000) knlGS:0000000000000000
Dec 19 17:52:27 mars kernel: [ 5210.760062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 19 17:52:27 mars kernel: [ 5210.761803] CR2: 00007fbbeeb41aec CR3: 00000009b99b1000 CR4: 00000000003406e0
Dec 19 17:52:27 mars kernel: [ 5210.763479] Call Trace:
Dec 19 17:52:27 mars kernel: [ 5210.765185] ? count_shadow_nodes+0xb0/0xb0
Dec 19 17:52:27 mars kernel: [ 5210.767281] workingset_update_node+0x4f/0x70
Dec 19 17:52:27 mars kernel: [ 5210.768944] __radix_tree_replace+0x70/0xf0
Dec 19 17:52:27 mars kernel: [ 5210.770535] page_cache_tree_insert+0x84/0xc0
Dec 19 17:52:27 mars kernel: [ 5210.772060] __add_to_page_cache_locked+0xc3/0x200
Dec 19 17:52:27 mars kernel: [ 5210.773536] add_to_page_cache_lru+0x4e/0xe0
Dec 19 17:52:27 mars kernel: [ 5210.775010] ext4_mpage_readpages+0x144/0x980
Dec 19 17:52:27 mars kernel: [ 5210.776384] ? alloc_pages_current+0x6a/0xe0
Dec 19 17:52:27 mars kernel: [ 5210.777735] ext4_readpages+0x33/0x40
Dec 19 17:52:27 mars kernel: [ 5210.779075] __do_page_cache_readahead+0x1c3/0x280
Dec 19 17:52:27 mars kernel: [ 5210.780508] filemap_fault+0x354/0x5e0
Dec 19 17:52:27 mars kernel: [ 5210.781826] ? filemap_fault+0x354/0x5e0
Dec 19 17:52:27 mars kernel: [ 5210.783149] ? filemap_map_pages+0x179/0x320
Dec 19 17:52:27 mars kernel: [ 5210.784609] ext4_filemap_fault+0x31/0x50
Dec 19 17:52:27 mars kernel: [ 5210.786472] __do_fault+0x1e/0xb0
Dec 19 17:52:27 mars kernel: [ 5210.788519] __handle_mm_fault+0xba7/0x1020
Dec 19 17:52:27 mars kernel: [ 5210.790035] handle_mm_fault+0xb1/0x200
Dec 19 17:52:27 mars kernel: [ 5210.791467] __do_page_fault+0x24d/0x4d0
Dec 19 17:52:27 mars kernel: [ 5210.792879] ? filp_close+0x53/0x80
Dec 19 17:52:27 mars kernel: [ 5210.794279] do_page_fault+0x22/0x30
Dec 19 17:52:27 mars kernel: [ 5210.795672] page_fault+0x28/0x30
Dec 19 17:52:27 mars kernel: [ 5210.797047] RIP: 0033:0x558d8f992ae5
Dec 19 17:52:27 mars kernel: [ 5210.798472] RSP: 002b:00007fbc1b7fc810 EFLAGS: 00010246
Dec 19 17:52:27 mars kernel: [ 5210.800050] RAX: 0000000000000000 RBX: 00007fbc1b7fc8b0 RCX: 0000000000000000
Dec 19 17:52:27 mars kernel: [ 5210.801452] RDX: 00007fbc1b7fc8a0 RSI: 00007fbc1b7fc8e0 RDI: 00007fbc1b7fc8b0
Dec 19 17:52:27 mars kernel: [ 5210.802866] RBP: 00007fbbeeb41ac0 R08: 00007fbc1b7fc8a0 R09: 00007fbc1b7fc900
Dec 19 17:52:27 mars kernel: [ 5210.804278] R10: 0000000000000001 R11: 0000000000000000 R12: 00007fbc1b7fc8a0
Dec 19 17:52:27 mars kernel: [ 5210.805687] R13: 0000000000000000 R14: 00007fbc1b7fc8a0 R15: 0000558d923f7250
Dec 19 17:52:27 mars kernel: [ 5210.807106] Code: 0f 1f 40 00 31 c0 5b 41 5c 41 5d 41 5e 5d c3 48 8b 53 20 48 85 d2 74 05 e9 3b 00 00 00 48 8d 43 08 49 8b 0e 49 8b 56 08 48 89 df <48> 89 51 08 48 89 0a 4d 89 36 4d 89 76 08 48 83 68 10 01 48 83
Dec 19 17:52:27 mars kernel: [ 5210.808634] RIP: list_lru_del+0x94/0x140 RSP: ffffa4f68e9df980
Dec 19 17:52:27 mars kernel: [ 5210.810342] ---[ end trace 964d9a4894744b45 ]---
In hung again last Saturday with this screen, which shows 99.9% swap and 49.6% used memory ???
I killed the GUI part since then, because I needed to get sound out of Radeon RX 470/480
HDMI, and because open source driver doesn't support HDMI, I installed AMD Pro driver, which doesn't work. It looks like we need standard desktop driver, not Pro.
Now getting newer driver from http://support.amd.com/en-us/kb-articles/Pages/Radeon-Software-for-Linux-Release-Notes.aspx
@YodaMike rebooted server several days ago, because docker
refused to kill frozen container. New uptime is here:
$ uptime
13:29:47 up 1 day, 23:06, 3 users, load average: 1.75, 1.87, 1.94
The ticket can be closed on 20 days uptime.
Updates require us to reboot server from time to time, and it is hard to make it 20 days uptime. But after installing native AMD drivers we couldn't catch it hanging. We've also got sound over HDMI as a result. Closing for now.
UPDATE: Native AMD drivers seem to make the system more stable. Ticket can be closed on 20 days uptime.
sudo less /var/log/syslog