crash-utility / crash

Linux kernel crash utility
https://crash-utility.github.io
837 stars 275 forks source link

`struct` memberlist and per-cpu `cpuspec` features incompatible / have confusing UX #103

Open problame opened 3 years ago

problame commented 3 years ago

My use case for crash in the last two days was to inspect a mountpoint's struct mount::mnt_pcp member. That member is a __percpu pointer, allocated here and defined here.

Expected UX for getting the per-cpu values:

crash> mount | head -n 1
     MOUNT           SUPERBLK     TYPE   DEVNAME   DIRNAME
crash> mount | grep mnt
ffff986121bfe880 ffff986003c0d800 ext4   /dev/loop4 /mnt
crash> struct mount.mnt_pcp -p ffff986121bfe880:a
[0]: ffffd1a5bc200eb0
struct mnt_pcp {
  mnt_count = 1,
  mnt_writers = 0
}
[1]: ffffd1a5bc240eb0
struct mnt_pcp {
  mnt_count = 1,
  mnt_writers = 0
}
[2]: ffffd1a5bc280eb0
struct mnt_pcp {
  mnt_count = 0,
  mnt_writers = 0
}
[3]: ffffd1a5bc2c0eb0
struct mnt_pcp {
  mnt_count = 0,
  mnt_writers = 0
}

Actual behavior with the last command:

crash> struct mount.mnt_pcp -p ffff986121bfe880:a
[0]: ffff30c2515fe880
struct: cannot access mount.mnt_pcp ffff30c2515fe8c8
[1]: ffff30c25163e880
struct: cannot access mount.mnt_pcp ffff30c25163e8c8
[2]: ffff30c25167e880
struct: cannot access mount.mnt_pcp ffff30c25167e8c8
[3]: ffff30c2516be880
struct: cannot access mount.mnt_pcp ffff30c2516be8c8

These invalid addresses are the result of ffff986121bfe880 + kt->__per_cpu_offset[...] computed here:

https://github.com/crash-utility/crash/blob/995db8ab88916b6397676b67be98c0a4f82cca49/symbols.c#L6679-L6690

Looking at that code, I figured out how to get the data I wanted. First, one needs to first get the per-cpu pointer value using the memberlist notation. That'd be 0x39448c800eb0 in my example. And then, one needs to use the :cpuspec notation with the member type. Like so:


crash> mount | grep mnt
ffff986121bfe880 ffff986003c0d800 ext4   /dev/loop4 /mnt
crash> struct mount.mnt_pcp ffff986121bfe880
  mnt_pcp = 0x39448c800eb0
crash> struct mnt_pcp 0x39448c800eb0:a
[0]: ffffd1a5bc200eb0
struct mnt_pcp {
  mnt_count = 1,
  mnt_writers = 0
}
[1]: ffffd1a5bc240eb0
struct mnt_pcp {
  mnt_count = 1,
  mnt_writers = 0
}
[2]: ffffd1a5bc280eb0
struct mnt_pcp {
  mnt_count = 0,
  mnt_writers = 0
}
[3]: ffffd1a5bc2c0eb0
struct mnt_pcp {
  mnt_count = 0,
  mnt_writers = 0
}

To make it just work as I expected, we'd need to know whether a pointer is a __percpu pointer. Is that information available?

If it is not, I suggest we make the memberlist syntax and :cpuspec syntax mutually exclusive when parsing the struct command, or at the very least, print a warning.

I'm happy to put in the work and make a PR, but maintainers should first decide which approach should be taken.

k-hagio commented 2 years ago

sorry, still I've not had enough time for this. @lian-bo , can you possibly take a look?

lian-bo commented 2 years ago

@k-hagio Sure. @problame , Christian, thank you for pointing out this issue. Crash can know if it is a percpu symbol via its name, see the definition per_cpu_symbol_search(). I haven't investigated it in detail, if it is a kernel percpu symbol, can you try to check if it(struct mnt_pcp percpu *mnt_pcp) is in the range of [per_cpu_start, per_cpu_end)?

k-hagio commented 2 years ago

@lian-bo, a __percpu pointer allocated by alloc_percpu() is not within the range.

crash> mount.mnt_pcp ffff916f5500b480
  mnt_pcp = 0x340b01007114,
crash> sym __per_cpu_start __per_cpu_end
0 (D) __per_cpu_start
2d000 (D) __per_cpu_end

To make it just work as I expected, we'd need to know whether a pointer is a __percpu pointer. Is that information available?

Hmm, as far as I've searched, looks not.

If it is not, I suggest we make the memberlist syntax and :cpuspec syntax mutually exclusive when parsing the struct command, or at the very least, print a warning.

Yes, I agree on making those mutually exclusive.

Thanks, Kazu