chef / ohai

Ohai profiles your system and emits JSON
https://docs.chef.io/ohai.html
Apache License 2.0
681 stars 452 forks source link

lscpu on Debian VM on ARM system crashes CPU plugin #1805

Closed fuegas closed 1 month ago

fuegas commented 1 year ago

Description

Running lscpu on a Debian (or other linux) VM on an arm64 system (for example an Apple M2 system) does not output the line Core(s) per socket but outputs Core(s) per cluster. Because Core(s) per socket is not in the output the line calculating the number of cpus crashes:

[2023-09-01T11:36:36+02:00] TRACE: Plugin CPU threw #<TypeError: nil can't be coerced into Integer>
[2023-09-01T11:36:36+02:00] TRACE: /opt/chef/embedded/lib/ruby/gems/3.1.0/gems/ohai-18.0.26/lib/ohai/plugins/cpu.rb:201:in `*'
[2023-09-01T11:36:36+02:00] TRACE: /opt/chef/embedded/lib/ruby/gems/3.1.0/gems/ohai-18.0.26/lib/ohai/plugins/cpu.rb:201:in `parse_lscpu'
[2023-09-01T11:36:36+02:00] TRACE: /opt/chef/embedded/lib/ruby/gems/3.1.0/gems/ohai-18.0.26/lib/ohai/plugins/cpu.rb:362:in `block (2 levels) in <main>'

The output of a Debian 12 VM on an amd64 system gives (truncated to relevant part):

Vendor ID:               AuthenticAMD
  BIOS Vendor ID:        AMD
  Model name:            AMD EPYC 7F52 16-Core Processor
    BIOS Model name:       CPU @ 3.5GHz
    BIOS CPU family:     1
    CPU family:          23
    Model:               49
    Thread(s) per core:  1
    Core(s) per socket:  2
    Socket(s):           1
    Stepping:            0

The output of lscpu on a Debian 12 VM on an arm64 system (in a Debian VM) gives (truncated to relevant part):

Vendor ID:               ARM
  BIOS Vendor ID:        Apple
  Model name:            -
    BIOS Model name:     Apple Silicon None CPU @ 2.0GHz
    BIOS CPU family:     257
    Model:               0
    Thread(s) per core:  1
    Core(s) per cluster: 2
    Socket(s):           1
    Cluster(s):          1
    Stepping:            r0p0

Ohai Version

# ohai --version
Ohai: 18.0.26

Platform Version

Debian 12 on a Apple M2 Macbook Pro.

Ohai Output

I've added some debug output at line 196 of the CPU plugin:

        else
          logger.info "sockets: #{lscpu_info[:sockets].inspect}"
          logger.info "cores_per_socket: #{lscpu_info[:cores_per_socket].inspect}"
          logger.info "threads_per_core: #{lscpu_info[:threads_per_core].inspect}"
          threads_per_core = [lscpu_info[:threads_per_core], 1].max
          lscpu_total = lscpu_info[:sockets] * lscpu_info[:cores_per_socket] * threads_per_core
          lscpu_real = lscpu_info[:sockets]
          lscpu_cores = lscpu_info[:sockets] * lscpu_info[:cores_per_socket]
        end

This results in:

[2023-09-01T11:42:55+02:00] TRACE: Plugin CPU: real cpu & core data is missing in /proc/cpuinfo and lscpu
[2023-09-01T11:42:55+02:00] TRACE: Plugin CPU: ran 'lscpu' and returned 0
[2023-09-01T11:42:55+02:00] TRACE: Plugin CPU: ran 'lscpu -p=CPU,CORE,SOCKET' and returned 0
[2023-09-01T11:42:55+02:00] INFO: sockets: 1
[2023-09-01T11:42:55+02:00] INFO: cores_per_socket: nil
[2023-09-01T11:42:55+02:00] INFO: threads_per_core: 1
[2023-09-01T11:42:55+02:00] TRACE: Plugin CPU threw #<TypeError: nil can't be coerced into Integer>
[2023-09-01T11:42:55+02:00] TRACE: /opt/chef/embedded/lib/ruby/gems/3.1.0/gems/ohai-18.0.26/lib/ohai/plugins/cpu.rb:200:in `*'
[2023-09-01T11:42:55+02:00] TRACE: /opt/chef/embedded/lib/ruby/gems/3.1.0/gems/ohai-18.0.26/lib/ohai/plugins/cpu.rb:200:in `parse_lscpu'
[2023-09-01T11:42:55+02:00] TRACE: /opt/chef/embedded/lib/ruby/gems/3.1.0/gems/ohai-18.0.26/lib/ohai/plugins/cpu.rb:361:in `block (2 levels) in <main>'
ramereth commented 11 months ago

This is due to a newer version of util-linux (>= 2.37.0) changing the formatting:

lscpu(1) has been reimplemented. Now it analyzes /sys for all CPUs and provides information for all CPU types used by the system (for example heterogeneous big.LITTLE ARMs, etc.). This command reads also SMBIOS tables to get CPU identifiers. Thanks to Masayoshi Mizuma from Fujitsu and Jeffrey Bastian from Red Hat. The default output on the terminal is more structured now to be more human-readable.

This is also breaking on RHEL >= 9 and Ubuntu >= 22.04. I am working on creating a PR to resolve this.