crash-utility / crash

Linux kernel crash utility
https://crash-utility.github.io
788 stars 266 forks source link

crash utility: add log / dmesg PRINTK_CALLER id support #164

Closed egchron closed 5 months ago

egchron commented 6 months ago

Submission to Project: crash Component: dmesg Files: printk.c makedumpfile.c help.c makedumpfile.h Code level patch applied against: 8.0.4++ - latest code pulled from https://github.com/crash-utility/crash.git crash Issue Tested with Kernel version and makedumpfile version: Linux Kernel Testing: Linux 6.6.6 #4 SMP PREEMPT_DYNAMIC Tue Dec 12 23:11:30 PST 2023 x86_64 GNU/Linux Linux 5.4.264 #9 SMP Thu Dec 21 07:00:08 PST 2023 makedumpfile Testing: makedumpfile: version 1.7.4++ (released on 6 Nov 2023) See Issue 13 for makedumpfile: add support for demsg PRINTK_CALLER id field dmesg Testing: util-linux 2.39.3++ See Issue 2609 for util-linux: add support for dmesg PRINTK_CALLER id field to standard dmesg kmsg interface

Add support so that dmesg entries include the optional Linux Kernel debug CONFIG option PRINTK_CALLER which adds an optional dmesg field that contains the Thread Id or CPU Id that is issuing the printk to add the message to the kernel ring buffer. If enabled, this CONFIG option makes debugging simpler as dmesg entries for a specific thread or CPU can be recognized.

The dmesg command supports printing the PRINTK_CALLER field. The old syslog format (dmesg -S) and recently support was added for dmesg using /dev/kmsg interface with util-linux Issue #2609 and we upstreamed a commit that is under review.

We've upstreamed a patch for makedumpfile that adds support for the PRINTK_CALLER id field so it will be available with the commands:

    makedumpfile --dump-dmesg /proc/vmcore dmesgfile
    makedumpfile --dump-dmesg -x vmlinux /proc/vmcore dmesgfile

The additional field provided by PRINTK_CALLER is only present if it was configured for the Linux kernel on the running system. The PRINTK_CALLER is a debug option and not configured by default so the dmesg output will only change for those kernels where the option was configured when the kernel was built. For users who went to the trouble to configure PRINTK_CALLER and have the extra field available. for debugging, having dmesg print the field is very helpful and so will be makedumpfile and so it would be very useful to have crash support for dump analysis.

Size of the PRINTK_CALLER field is determined by the maximum number tasks that can be run on the system which is limited by the value of /proc/sys/kernel/pid_max as pid values are from 0 to value - 1. This value determines the number of id digits needed by the caller id. The PRINTK_CALLER field is printed as T for a Task Id or C for a CPU Id for a printk in CPU context. The values are left space padded and enclosed in parentheses such as: [ T123] or [ C16]

Displaying the PRINTK_CALLER field in the log/dmesg record output:

Given the layout of log/dmesg records printed by crash, for example:

crash> log -m ... [ 0.000000] <7>e820: remove [mem 0xff000000-0xffffffff] reserved [ 0.000000] <6>SMBIOS 3.4.0 present. ... [ 0.014179] <6>Secure boot disabled [ 0.014179] <6>RAMDISK: [mem 0x3cf4f000-0x437bbfff] ... [ 663.328848] <6>sysrq: Trigger a crash [ 663.328859] <0>Kernel panic - not syncing: sysrq triggered crash

Our patch adds the PRINTK_CALLER field after the timestamp if the printk_caller log / dmesg option (-P) is selected:

crash> log -m -P ... [ 0.014179] [ T1] <6>Secure boot disabled [ 0.014179] [ T29] <6>RAMDISK: [mem 0x3cf4f000-0x437bbfff] ...

This is consistent placement with dmesg and makedumpfile.

To produce dmesg output with the PRINTK_CALLER id included, we add a new log / dmesg command option: -P

The PRINTK_CALLER id field is printed only if the -P option is selected. The description of the log -P option that is seen in the help is:

crash> log help

log dump system message buffer [-TtdmasP]

... ...

-P Display the PRINTK_CALLER id field that identifies the thread id or the CPU id of the task that issued the printk to add the message to the kernel ring buffer. Displaying this field is only possible for Linux kernels that are Linux 5.1 or newer since kernels prior to Linux 5.1 did not provide a PRINTK_CALLER field. So, this option is ignored for kernels older than Linux 5.1. The CONFIG_PRINTK_CALLER kernel configuration option should be selected to make the caller_id field available and for vmcore files your version of makedumpfile must support dumping the caller_id field for it to be available here.

Also seen in the help file :

On systems that are properly configured to make the log caller_id field available to the crash utility, you can select to print log records and include the caller_id field with each log record. The log PRINTK_CALLER id option (-P) will print either the Thread id or the CPU id depending on the context of the process at the time the printk request was made. The output of the thread id (begins with T) or the CPU id (begins with C) is placed inside square brackets and is padded with leading space to keep alignment. The caller_id field follows the timestamp field if that field is selected:

crash> log -P
...
[    0.014179] [     T1] Secure boot disabled
[    0.014179] [    T29] RAMDISK: [mem 0x3cf4f000-0x437bbfff]
[    0.198789] [     C0] DMAR: DRHD: handling fault status reg 3
...
crash> log -P -m
...
[    0.014179] [     T1] <6>Secure boot disabled
[    0.014179] [    T29] <6>RAMDISK: [mem 0x3cf4f000-0x437bbfff]
[    0.198789] [     C0] <6>DMAR: DRHD: handling fault status reg 3
...
crash> log -t -P -m
...
[     T1] <6>Secure boot disabled
[    T29] <6>RAMDISK: [mem 0x3cf4f000-0x437bbfff]
[     C0] <6>DMAR: DRHD: handling fault status reg 3
...
crash> log -T -P -m
...
[Tue Dec 12 23:25:03 PST 2023] [     T1] <6>Secure boot disabled
[Tue Dec 12 23:25:03 PST 2023] [    T29] <6>RAMDISK: [mem 0x3cf4f000-0x437bbfff]
[Tue Dec 12 23:25:03 PST 2023] [     C0] <6>DMAR: DRHD: handling fault status reg 3
...

Signed-off-by: Ivan Delalande colona@arista.com Signed-off-by: Edward Chron echron@arista.com

egchron commented 6 months ago

Here is the patch I posted to the mailing list, assuming it actually got there. I tried to subscribe but have had issues and tried to contact the support folks but haven't heard back from them. End of the year, maybe folks are on vacation for the holidays or just using any remaining vacation.

crash-add-log-dmesg-PRINTK_CALLER-id-support-rev0.patch.gz

egchron commented 6 months ago

Here are test results for both makedumpfile --dump-dmesg and for the crash utility for the log -P option that the patch that adds the caller_id field to the log records printed if the log caller_id field is present in the dump.

I have a patch submitted upstream to makedumpfile (makedumpfile Issue-13) that provides PRINTK_CALLER id support for makedumpfile. That makedumpfile patch is needed to test the crash facility since to provides the support to produce vmcore files that include the caller_id field for log records. So that is why I also include output from makedumpfie --dump-dmesg so that you can compare the makedumpfile output for the makedumpfile --dump-dmesg command versus the output crash log -P produces to print out the dmesg records.

Note that I tested and provide results for two Linux Kernels: linux-5.4.256 and linux-6.6.6

The reason for this is to cover the two versions of the Linux kernel ringbuffer that provide PRINTK_CALLER support the versions for printk_log and printk_info.

There are two versions because PRINTK_CALLER id support as provided by the CONFIG_PRINTK_CALLER kernel configuration parameter that was added in Linux-5.1 which is using the printk_log internal structure. The newer printk_caller structure used by the lockless ringbuffer was added in Linux 5.10 and has been in use ever since and this structure replaced the older printk_log structure with the printk_info structure.

So Linux-5.4.264 covers the older printk_log internal structure format and Linux-6.6.6 covers the newer printk_info internal structure format of the Linux ringbuffer.

Here are testing output files created using the crash patch posted here and in the mailing list and the makedumpfile patch posted to makedumpfile Issue-13.

Using enhanced makedumpfile 1.7.4++ output: makedumpfile --dump-dmesg

For Linux Kernel 5.4.264 with PRINTK_CALLER configured:

The makedumpfile --dump-dmesg output:

test-makedumpfile-dump-dmesg-v5.4.264_CONFIG_PRINTK_CALLER.txt.gz

The crash utility: log -P output

testing_crash_linux-5.4.264-with_CONFIG_PRINTK_CALLER.txt.gz

For Linux Kernel 6.6.6 with PRINTK_CALLER configured:

The makedumpfile --dump-dmesg output:

test-makedumpfile-dump-dmesg-v6.6.6.txt.gz

The crash utility: log -P output

testing_crash_linux-6.6.6.txt.gz

egchron commented 6 months ago

Here is the most recent version of the makedumpfile patch submitted upstream for makedumpfile Issue-13 in case you need that patch for testing.

For reference, it was applied and tested to makedumpfile version 1.7.4++ Kazuhito Hagio k-hagio-ab@nec.com has been reviewing that patch so acceptance of the patch is still pending.

makedumpfile-add-dmesg-PRINTK_CALLER-id-support-rev3.patch.gz

egchron commented 5 months ago

Revision 2 patch with recommended revisions per Kazu's review, also posted to the mailing list.

crash-add-log-dmesg-PRINTK_CALLER-id-support-rev2.patch.gz

egchron commented 5 months ago

FYI,

The util-linux folks have merged the dmesg upgrade to support caller_id with the dmesg /dev/kmsg interface so now there is support for PRINTK_CALLER caller_id for both SYSLOG and default dmesg format.

Thank you,

Edward