alibaba / tsar

Taobao System Activity Reporter
Apache License 2.0
2.48k stars 735 forks source link

File "/var/lib/systemd/coredump/core.tsar.0.xxx.lz4" is not readable: Permission denied #97

Open limkokhole opened 5 years ago

limkokhole commented 5 years ago

我用进程跟踪器发现常出现 tsar coredump, 一查看 coredumpctl list 顿时吓鸟:

...
Tue 2019-07-16 02:18:02 +08   26818     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:19:01 +08   26875     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:20:02 +08   26934     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:21:01 +08   26981     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:22:02 +08   27042     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:23:01 +08   27120     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:24:02 +08   27158     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:25:01 +08   27193     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:26:02 +08   27242     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:27:01 +08   27300     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:28:02 +08   27343     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:29:01 +08   27390     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:30:02 +08   27434     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:31:01 +08   27480     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:32:02 +08   27545     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:33:01 +08   27583     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:34:02 +08   27624     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:35:01 +08   27800     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:36:02 +08   27843     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:37:01 +08   27950     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:38:02 +08   28025     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:39:01 +08   28077     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:40:02 +08   28205     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:41:01 +08   28286     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:42:02 +08   28387     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:43:01 +08   28428     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:44:02 +08   28499     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:45:01 +08   28696     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:46:02 +08   28747     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:47:01 +08   28818     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:48:02 +08   28915     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:49:01 +08   28954     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:50:02 +08   29049     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:51:01 +08   29153     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 02:52:02 +08   29233     0     0   8 error     /usr/bin/tsar
Tue 2019-07-16 15:10:22 +08   11781     0     0   8 error     /usr/bin/tsar
lines 6735-6770/6770 (END

我看见 parent 是 cron, 所以查看 /etc/cron.d/tsar:

# cron tsar collect once per minute
MAILTO=""
* * * * * root /usr/bin/tsar --cron > /dev/null 2>&1

直接执行的错误信息跟之前的 cron 的 coredump 同样显示没有权限的错误信息:

$ sudo /usr/bin/tsar --cron
Floating point exception
$ coredumpctl dump 11781 --output tsar.11781.alamak
           PID: 11781 (tsar)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 8 (FPE)
     Timestamp: Tue 2019-07-16 15:10:21 +08 (58s ago)
  Command Line: /usr/bin/tsar --cron
    Executable: /usr/bin/tsar
 Control Group: /user.slice/user-1000.slice/session-2.scope
          Unit: session-2.scope
         Slice: user-1000.slice
       Session: 2
     Owner UID: 1000 (xiaobai)
       ...
       Storage: /var/lib/systemd/coredump/core.tsar.0.4537918427684a69bcb0013d8b9d7172.11781.1563261021000000.lz4 (inaccessible)
       Message: Process 11781 (tsar) of user 0 dumped core.

                Stack trace of thread 11781:
                #0  0x00007f1a51539d7f store_single_partition (mod_partition.so)
                #1  0x00007f1a51539ea6 read_partition_stat (mod_partition.so)                                                                         
                #2  0x00005625a1125417 collect_record (tsar)                                                                                          
                #3  0x00005625a1125b57 running_cron (tsar)
                #4  0x00005625a1123250 main (tsar)
                #5  0x00007f1a53180b97 __libc_start_main (libc.so.6)
                #6  0x00005625a11232ea _start (tsar)
File "/var/lib/systemd/coredump/core.tsar.0.4537918427684a69bcb0013d8b9d7172.11781.1563261021000000.lz4" is not readable: Permission denied
$ ls -l "/var/lib/systemd/coredump/core.tsar.0.4537918427684a69bcb0013d8b9d7172.11781.1563261021000000.lz4"
-rw-r----- 1 root root 588594 Jul  16 15:10 /var/lib/systemd/coredump/core.tsar.0.4537918427684a69bcb0013d8b9d7172.11781.1563261021000000.lz4
$ 

我已暂时把 /etc/cron.d/tsar 移除。 我的平台是 Ubuntu 18.04.2 LTS