Description of changes:
Issue 1:
Build failed in CodeBuild
Step 17/26 : ARG NCCL_VERSION=2.22.3-1+cuda${CUDA_MAJOR_VERSION}.${CUDA_MINOR_VERSION}
---> Running in 1a06f4ac470e
Removing intermediate container 1a06f4ac470e
---> 07cde702fb9b
Step 18/26 : RUN apt update && apt install -y libnccl2=${NCCL_VERSION} libnccl-dev=${NCCL_VERSION}
...
E: Version '2.22.3-1' for 'libnccl2' was not found
E: Version '2.22.3-1' for 'libnccl-dev' was not found
The command '/bin/sh -c apt update && apt install -y libnccl2=${NCCL_VERSION} libnccl-dev=${NCCL_VERSION}' returned a non-zero code: 100
temp fix by hardcode it
Issue 2:
The unit test test_nvidia_persistence_status is failing on Bottlerocket as it is not enabled, there are an incoming release will fix it. But extending a flag to skip tests for flexibility
k logs unit-test-job-77xkh -f
# Running tests in gpu_unit_tests/tests/test_basic.sh
ok - test_01_device_query
ok - test_02_vector_add
ok - test_03_bandwidth
ok - test_04_bus_grind
ok - # skip skip pattern: test_05_dcgm_diagnostics|test_nvidia_persistence_status
# Running tests in gpu_unit_tests/tests/test_sysinfo.sh
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:02:02 --:--:-- 0
curl: (56) Recv failure: Connection reset by peer
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 10 100 10 0 0 13717 0 --:--:-- --:--:-- --:--:-- 10000
ok - test_numa_topo_topo
ok - test_nvidia_gpu_count
ok - test_nvidia_gpu_throttled
ok - test_nvidia_gpu_unused
ok - # skip skip pattern: test_05_dcgm_diagnostics|test_nvidia_persistence_status
ok - test_nvidia_smi_topo
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Issue #, if available:
Description of changes: Issue 1: Build failed in CodeBuild
temp fix by hardcode it
Issue 2: The unit test
test_nvidia_persistence_status
is failing on Bottlerocket as it is not enabled, there are an incoming release will fix it. But extending a flag to skip tests for flexibilityTesting with below
Output
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.