Change the machine type from Standard_DS5_v2 to Standard_NC8as_T4_v3
Deploy the endpoint
Run inference with response = workspace_ml_client.online_endpoints.invoke( ... )
Expected behavior
Audio is transcribed
Actual behavior
Exception has occurred: Exception
Expected a torch.device with a specified index or an integer, but got:cuda
StopIteration: 0
During handling of the above exception, another exception occurred:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
File "/Users/danielve/azureml-examples/inference/automatic-speech-recognition/asr-online-endpoint-inference.py", line 28, in <module>
response = workspace_ml_client.online_endpoints.invoke(
Exception: Expected a torch.device with a specified index or an integer, but got:cuda
Addition information
This issue seems to arise when using GPUs. I have enough quota, so that shouldn't be the problem.
Operating System
Linux
Version Information
Python Version: 3.9.6 azure-ai-ml package version: 1.8.0
Steps to reproduce
sdk/python/foundation-models/system/inference/automatic-speech-recognition/asr-online-endpoint.ipynb
Standard_DS5_v2
toStandard_NC8as_T4_v3
response = workspace_ml_client.online_endpoints.invoke( ... )
Expected behavior
Audio is transcribed
Actual behavior
Addition information
This issue seems to arise when using GPUs. I have enough quota, so that shouldn't be the problem.