If I deploy a pool with Standard_NC4as_T4_v3 without the gpu:nvidia_driver:source specification in pool.yaml, the pool succeeds but the NVIDIA drivers are not installed.
If I specify gpu:nvidia_driver:source, I get an error: local variable 'gpu_driver' referenced before assignment
The same pool.yaml works fine with Standard_NC6s_v3
Batch Shipyard Version
3.9.1
Steps to Reproduce
Try to deploy a pool with Standard_NC4as_T4_v3
Expected Results
Pool is deployed
Actual Results
Error is returned when gpu:nvidia_driver:source specification is provided in pool.yaml:
2021-09-21 09:02:21.573 INFO - uploading file /tmp/_MEIRpaARG/scripts/shipyard_docker_exec_task_runner.sh as 'shipyard_docker_exec_task_runner.sh'
Traceback (most recent call last):
File "shipyard.py", line 3136, in <module>
File "site-packages/click/core.py", line 764, in __call__
File "site-packages/click/core.py", line 717, in main
File "site-packages/click/core.py", line 1137, in invoke
File "site-packages/click/core.py", line 1137, in invoke
File "site-packages/click/core.py", line 956, in invoke
File "site-packages/click/core.py", line 555, in invoke
File "site-packages/click/decorators.py", line 64, in new_func
File "site-packages/click/core.py", line 555, in invoke
File "shipyard.py", line 1546, in pool_add
File "convoy/fleet.py", line 3451, in action_pool_add
File "convoy/fleet.py", line 1849, in _add_pool
File "convoy/fleet.py", line 1555, in _construct_pool_object
UnboundLocalError: local variable 'gpu_driver' referenced before assignment
[9269] Failed to execute script shipyard
I also tried with source: https://us.download.nvidia.com/tesla/460.73.01/NVIDIA-Linux-x86_64-460.73.01.run which deploys without issues other NC series (e.g., NC6s v3) and got the same error.
Problem Description
If I deploy a pool with Standard_NC4as_T4_v3 without the
gpu:nvidia_driver:source
specification in pool.yaml, the pool succeeds but the NVIDIA drivers are not installed.If I specify
gpu:nvidia_driver:source
, I get an error:local variable 'gpu_driver' referenced before assignment
The same pool.yaml works fine with Standard_NC6s_v3
Batch Shipyard Version
3.9.1
Steps to Reproduce
Try to deploy a pool with Standard_NC4as_T4_v3
Expected Results
Pool is deployed
Actual Results
Error is returned when
gpu:nvidia_driver:source specification
is provided in pool.yaml:Redacted Configuration
pool.yaml
config.yaml
Additional Logs
Additonal Comments
I also tried with
source: https://us.download.nvidia.com/tesla/460.73.01/NVIDIA-Linux-x86_64-460.73.01.run
which deploys without issues other NC series (e.g., NC6s v3) and got the same error.