CodeConstruct / dbus-sensors

D-Bus configurable sensor scanning applications
Apache License 2.0
0 stars 3 forks source link

[issue] fallbackNoSecondary() crashes the nvmed #7

Open drakedog2008 opened 9 months ago

drakedog2008 commented 9 months ago

The change https://github.com/CodeConstruct/dbus-sensors/commit/c6a4016162c845ca7c3beaf3d7bbdeed2d57579a created a crash on the google release.

2023-12-01 3:49:48 systemd,Started NVMe Sensor.
2023-12-01 3:49:48 nvmesensor,error getting  SpecialMode status No route to host
2023-12-01 3:49:54 nvmesensor,"[bus: 6, addr: 48, eid: 17]fail to do nvme identify:"
2023-12-01 3:49:54 nvmesensor,adminIdentify:NVMe MI: Invalid Parameter (MI status 0x4)
2023-12-01 3:49:54 nvmesensor,fail to do nvme identify: NVMe MI: Invalid Parameter (MI status 0x4)
2023-12-01 3:49:54 nvmesensor,fail to identify secondary controller list
2023-12-01 3:49:54 nvmesensor,"Failed to identify secondary controller list. error NVMe MI: Invalid Parameter (MI status 0x4) data size 4096 expected size 4096. Fallback, using arbitrary controller as primary."  
2023-12-01 3:49:54  systemd,Started Process Core Dump (PID 4452/UID 0).
2023-12-01 3:50:00  systemd-coredump,Process 4432 (nvmesensor) of user 0 dumped core.
2023-12-01 3:50:00  systemd,"xyz.openbmc_project.nvmesensor.service: Main process exited, code=dumped, status=11/SEGV"
2023-12-01 3:50:00 systemd,xyz.openbmc_project.nvmesensor.service: Failed with result 'core-dump'.

Reverted the change in : https://gbmc-review.googlesource.com/c/dbus-sensors/+/14474

drakedog2008 commented 9 months ago

Google addressed the same issue with the patch: https://gbmc.googlesource.com/dbus-sensors/+/622891d5d3476e868d9b2a4a150e28717f1f5979%5E%21/#F0

The assumption is:

If a NVMe device doesn't support(enable) the SRIOV, it should have only one controller as PF;

If a NVMe device supports and enables the SRIOV, it should support id_secondary_controller_list().

drakedog2008 commented 9 months ago

@mkj