criteo / hwbench

hwbench is a benchmark orchestration tool to automate the low-level testing of servers.
Apache License 2.0
19 stars 4 forks source link

runtime error on Dell/Intel server #25

Open ezekriSCW opened 3 months ago

ezekriSCW commented 3 months ago

When trying to use hwbench with a Dell C6420 server (Intel Xeon Silver 4114), an error is raised with following bt. Note: it seems that CPU vendor hasn't been detected properly.

# python3 -m hwbench.hwbench -j configs/simple.conf -m monitoring.cfg
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/tmp/hwbench/hwbench/hwbench.py", line 10, in <module>
    from .bench import benchmarks
  File "/tmp/hwbench/hwbench/bench/benchmarks.py", line 6, in <module>
    from .benchmark import Benchmark
  File "/tmp/hwbench/hwbench/bench/benchmark.py", line 4, in <module>
    from .parameters import BenchmarkParameters
  File "/tmp/hwbench/hwbench/bench/parameters.py", line 3, in <module>
    from .monitoring import Monitoring
  File "/tmp/hwbench/hwbench/bench/monitoring.py", line 4, in <module>
    from ..environment.hardware import BaseHardware
  File "/tmp/hwbench/hwbench/environment/hardware.py", line 7, in <module>
    from .vendors.detect import first_matching_vendor
  File "/tmp/hwbench/hwbench/environment/vendors/detect.py", line 3, in <module>
    from .amd.amd import Amd
  File "/tmp/hwbench/hwbench/environment/vendors/amd/amd.py", line 1, in <module>
    from ..vendor import Vendor
  File "/tmp/hwbench/hwbench/environment/vendors/vendor.py", line 11, in <module>
    from ...utils.external import External
  File "/tmp/hwbench/hwbench/utils/external.py", line 8, in <module>
    class External(ABC):
  File "/tmp/hwbench/hwbench/utils/external.py", line 14, in External
    def run_cmd(self) -> list[str]:
TypeError: 'type' object is not subscriptable
ErwanAliasr1 commented 3 months ago

Can you share a "dmidecode -t 1" & "dmidecode -t 3" and a "lscpu"

ErwanAliasr1 commented 3 months ago

Please note you are running Python 3.8 while 3.9 is the minimal release

ezekriSCW commented 3 months ago

arf! I missed that! I'll first update python, than I'll check if I still have the same error. Thanks

ErwanAliasr1 commented 3 months ago

I'm pushing a change to ensure no one runs below the minimal release. https://github.com/criteo/hwbench/pull/26

ErwanAliasr1 commented 3 months ago

PR #26 is merged to avoid this improper python release issue in the future.

ezekriSCW commented 3 months ago

Python upgraded to 3.9 still have an error, but a different one

# python3 -m hwbench.hwbench -j configs/simple.conf -m monitoring.cfg
Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/tmp/hwbench/hwbench/hwbench.py", line 108, in <module>
    main()
  File "/tmp/hwbench/hwbench/hwbench.py", line 30, in main
    tuning_setup.Tuning(tuning_out_dir).apply()
  File "/tmp/hwbench/hwbench/tuning/setup.py", line 15, in apply
    PerformancePowerProfile(self.out_dir).run()
  File "/tmp/hwbench/hwbench/tuning/power_profile.py", line 33, in run
    previous = file.read_text(encoding="utf-8").rstrip()
  File "/usr/lib/python3.9/pathlib.py", line 1256, in read_text
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
  File "/usr/lib/python3.9/pathlib.py", line 1242, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
  File "/usr/lib/python3.9/pathlib.py", line 1110, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/sys/devices/system/cpu/cpu21/cpufreq/scaling_governor'
# dmidecode -t 1
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.

Handle 0x0100, DMI type 1, 27 bytes
System Information
    Manufacturer: Dell Inc.
    Product Name: PowerEdge C6420
    Version: Not Specified
    Serial Number: xxxxxx
    UUID: xxxxxxx
    Wake-up Type: Power Switch
    SKU Number: SKU=0757;ModelName=PowerEdge C6420
    Family: PowerEdge
# dmidecode -t 3
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.

Handle 0x0300, DMI type 3, 22 bytes
Chassis Information
    Manufacturer: Dell Inc.
    Type: Rack Mount Chassis
    Lock: Not Present
    Version: PowerEdge C6400
    Serial Number: xxxxxxx
    Asset Tag: Not Specified
    Boot-up State: Safe
    Power Supply State: Safe
    Thermal State: Safe
    Security Status: Unknown
    OEM Information: 0x00000000
    Height: 2 U
    Number Of Power Cords: 2
    Contained Elements: 0
    SKU Number: Not Specified

(hided s/n)

ErwanAliasr1 commented 2 months ago

@ezekriSCW can you provide the result of a " find /sys/devices/system/cpu", "uname -a" , "cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors " , "/sys/devices/system/cpu/cpuidle/available_governors", please ?

ezekriSCW commented 2 months ago

find_cpu.txt

# uname -a
Linux poweredge-c6420-xxxxxx 5.8.0-38-generic #43~20.04.1-Ubuntu SMP Tue Jan 12 16:39:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
cat: /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors: No such file or directory
# cat /sys/devices/system/cpu/cpuidle/available_governors
ladder menu teo
ErwanAliasr1 commented 1 month ago

Please confirm the latest code with PR30 merged is solving your case