crash-utility / crash

Linux kernel crash utility
https://crash-utility.github.io
788 stars 266 forks source link

[ARM] Crash failed to parse the core header when makedumpfile is compiled with -D_TIME_BITS=64 #177

Open zyxiaooo opened 3 months ago

zyxiaooo commented 3 months ago

Hi,

I have ARM platforms using kernel 5.15. Recently we switched to 64 bit time and then found that the core failed to open with the following error:

crash: diskdump / compressed kdump: cannot malloc block_size buffer

All data after timestamp shift 12 bytes in the core header:

struct disk_dump_header {
        char                    signature[SIG_LEN];     /* = "DISKDUMP" */
        int                     header_version; /* Dump header version */
        struct new_utsname      utsname;        /* copy of system_utsname */
        struct timeval          timestamp;      /* Time stamp */
        uint8_t dummy[12];   <<<<<<<<<<<<<<<<<<<<<<<<<< add this will temporarily workaround the issue

I also tried to compile crash with the follow command to match the makedumpfile one:

 make target=ARM CFLAGS="-D_TIME_BITS=64"

But got another error:

WARNING: compressed kdump: invalid nr_cpus value: 0
Segmentation fault

Any idea how to correctly handle the core dump with ARM + -D_TIME_BITS=64 ?

liutgnu commented 3 months ago

Hi,

I have ARM platforms using kernel 5.15. Recently we switched to 64 bit time and then found that the core failed to open with the following error:

crash: diskdump / compressed kdump: cannot malloc block_size buffer

The error msg is just the fail of realloc in diskdump.c:read_dump_header(), could you check the failing reason of realloc? Is it due to memory shortage or incorrect value of block_size? In addition, a strerror() may help.

All data after timestamp shift 12 bytes in the core header:

struct disk_dump_header {
        char                    signature[SIG_LEN];     /* = "DISKDUMP" */
        int                     header_version; /* Dump header version */
        struct new_utsname      utsname;        /* copy of system_utsname */
        struct timeval          timestamp;      /* Time stamp */
        uint8_t dummy[12];   <<<<<<<<<<<<<<<<<<<<<<<<<< add this will temporarily workaround the issue

Yeah, it makes sense, because the 64bit time will use larger space.

I also tried to compile crash with the follow command to match the makedumpfile one:

 make target=ARM CFLAGS="-D_TIME_BITS=64"

In my computer(fedora 38),

$ cat /usr/include/bits/types/struct_timeval.h struct timeval {

ifdef __USE_TIME_BITS64

time64_t tv_sec; / Seconds. / suseconds64_t tv_usec; / Microseconds. /

else

time_t tv_sec; / Seconds. / suseconds_t tv_usec; / Microseconds. /

endif

};

I guess(not tried) it should be "CFLAGS="-D__USE_TIME_BITS64"", in order to enable 64bit timestamp.

But got another error:

WARNING: compressed kdump: invalid nr_cpus value: 0
Segmentation fault

Segfault can represent many things. It is better to have a gdb bt stacktrace for further debug.

Any idea how to correctly handle the core dump with ARM + -D_TIME_BITS=64 ?

zyxiaooo commented 3 months ago

Thanks for the reply.

crash: diskdump / compressed kdump: cannot malloc block_size buffer

Due to the header mismatch, this is because the block_size it reads is 0.

I guess(not tried) it should be "CFLAGS="-D__USE_TIME_BITS64"", in order to enable 64bit timestamp.

I also tried this but got the same error.

WARNING: compressed kdump: invalid nr_cpus value: 0
Segmentation fault

I think this is still header mismatch, because nr_cpu is not 0 in the test core. Haven't got a chance to dig further though.

liutgnu commented 3 months ago

Yeah, the block_size == 0 is abnormal, which comes from the disk_dump_header, which coming from makedumpfile. It's better to have the vmcore, dump the disk_dump_header into hex, and verify if it is due to error of makedumpfile or kernel itself.

zyxiaooo commented 3 months ago

With kernel 5.15, and a makedumpfile compiled with the -D_TIME_BITS=64, I hexdumped the the generated core header, and I can see that there are 12 bytes more around the timestamp field.

With exactly the same kernel, and a makedumpfile compiled WITHOUT -D_TIME_BITS=64, everything works fine.

So I guess there are some issue with makedumpfile with that flag.

Note that not sure if it is related, but we generate the core in flat mode first (makedumpfile -F -c), then make them back to non-flat mode (makedumpfile -R). Just let you know in case it is an issue only under this scenario.

liutgnu commented 3 months ago

Not sure neither, sorry I cannot provide any further useful info.