aws / aws-k8s-tester

AWS Kubernetes tester, kubetest2 deployer implementation
Apache License 2.0
163 stars 82 forks source link

Fix unit test #488

Closed Issacwww closed 2 weeks ago

Issacwww commented 2 weeks ago

Issue #, if available:

Description of changes: encountered several issue when execute unit test for bottlerocket

  1. # Running tests in gpu_unit_tests/tests/test_basic.sh
    common.sh: line 14: nvidia-smi: command not found

    fix by adding resource limit

  2. dgcmi not found fix by installing datacenter-gpu-manager

  3. missing file for test-sysinfo, followed the readme to add expected files

testing

k logs unit-test-job-nb6gn
# Running tests in gpu_unit_tests/tests/test_basic.sh
ok - test_01_device_query
ok - test_02_vector_add
ok - test_03_bandwidth
ok - test_04_bus_grind
ok - test_05_dcgm_diagnostics
# Running tests in gpu_unit_tests/tests/test_sysinfo.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:02:03 --:--:--     0
curl: (56) Recv failure: Connection reset by peer
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    10  100    10    0     0  12787      0 --:--:-- --:--:-- --:--:-- 10000
ok - test_numa_topo_topo
ok - test_nvidia_gpu_count
ok - test_nvidia_gpu_throttled
ok - test_nvidia_gpu_unused
not ok - test_nvidia_persistence_status
# Unexpected perfistance status, likely system configuration issue
#  test data value diff:
# --- test_sysinfo.sh.data/g5.8xlarge/nvidia_persistence_status.txt 2024-10-07 22:15:11.000000000 +0000
# +++ /tmp/test_sysinfo.sh.actual-data.tgP/nvidia_persistence_status.txt    2024-10-08 07:37:30.716879902 +0000
# @@ -1,2 +1,2 @@
#  name, pci.bus_id, persistence_mode
# -NVIDIA A10G, 00000000:00:1E.0, Enabled
# +NVIDIA A10G, 00000000:00:1E.0, Disabled
# common.sh:32:_assert_data()
# common.sh:37:assert_data()
# test_sysinfo.sh:52:test_nvidia_persistence_status()
ok - test_nvidia_smi_topo

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.