chef / ohai

Ohai profiles your system and emits JSON
https://docs.chef.io/ohai.html
Apache License 2.0
681 stars 452 forks source link

[lspci] Fix handling of more than one PCIe domain #1822

Closed Babar closed 3 months ago

Babar commented 4 months ago

Description

(sorry, I'm using domain and root port interchangeably. Maybe domain is more correct, if so I can update the code to use that instead of root_port)

I'm guessing most machines only have one PCIe root port, but things like ARM hosts tend to have more, and that breaks as it creates collisions.

As we want to keep backwards compatibility, and lspci is run with the defaults that doesn't show the root port if it's 0000, only add it when it's non-zero, and also add it as a new extra field called root_port to denote the fact that there is one (useful when using the BDF to find files on disk, as when there is no root_port, one needs to add the extra 0000:.

More detailed explanation I wrote on the slack channel: On Linux, we run lspci -vnnmk, which will show the BDF in the first Device: field. So the first bug is that the regular expression on https://github.com/chef/ohai/blob/main/lib/ohai/plugins/linux/lspci.rb#L54 is missing a \ before the ., therefore will will match 0001:02:03.4 as 01:02:03 which is plain broken. But fixing that we end up with 02:03.4 which is also plain wrong as it loses the root complex. I see a few ways we could fix this:

  1. Highly disruptive: we add -D to lspci, which means we'll always get the root complex even when it's 0000 . This is terrible 'cause it will basically break everything
  2. We do this but only with an option in ohai. Less terrible, but means people have to know about that option
  3. We do this but only when we notice more than one root complex. That's not really doable
  4. We add a new field, root_complex in the PCI structure, which is 0000 by default and the actual root complex for everything else

I like 4 the most, as it mirrors closely what lspci does (https://github.com/pciutils/pciutils/blob/master/lspci.c#L285) or maybe

  1. We do it the same way lspci does it: if the root complex is 0, we just store the BDF as the ID, if it's not, we store the root complex + BDF we can't really do 4 'cause we might have collisions... Like if you have 0000:01:02.3 and 0001:01:02.3 then they have to be stored differently. So we can either do: Option A: 01:02.3 and 0001:01:02.3 (this can co-exist with everything today) Option B: 00000:1:02.3 and 0001:01:02.3 (this most likely needs to become pci2)

Test Plan: Before:

$ ohai | jq .pci\|keys
[
  "00:00.0",
  "01:00.0",
  "01:00.1",
  "02:00:0",
  "04:00:0",
  "08:00:0",
  "08:01:0",
  "08:02:0",
  "08:03:0",
  "08:04:0",
  "08:05:0",
  "09:00:0",
  "09:01:0"
]

After:

$ ohai | jq .pci\|keys
[
  "0002:00:00.0",
  "0004:00:00.0",
  "0008:00:00.0",
  "0008:01:00.0",
  "0008:02:01.0",
  "0008:02:02.0",
  "0008:03:00.0",
  "0008:04:00.0",
  "0008:05:00.0",
  "0009:00:00.0",
  "0009:01:00.0",
  "00:00.0",
  "01:00.0",
  "01:00.1"
]

Related Issue

pci collection should always include the domain #1693

Types of changes

Checklist:

tpowell-progress commented 4 months ago

@Babar can you add a couple cases in spec/unit/plugins/linux/lspci_spec.rb to match the additional regex component specifically?

sonarcloud[bot] commented 4 months ago

Quality Gate Passed Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

Babar commented 4 months ago

@tpowell-progress : added one. Can add more but wasn't sure what I should be testing.