Deep Speech 2 Version 3.0 fails on PyTorch 1.10 / TorchAudio 0.10

Describe the bug Deep Speech 2 model fails when using V3.0, at least for torch 1.10 / torchaudio 1.11.

It fails at this line: https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/art/estimators/speech_recognition/pytorch_deep_speech.py#L791

spectrogram, _ = torchaudio.functional.magphase(transformed_input)

I get the following stack trace on both GPU and CPU:

  File "/workspace/armory/scenarios/scenario.py", line 267, in run_benign                                                                                                            
    y_pred = self.model.predict(x, **self.predict_kwargs)                                                                                                                            
  File "/workspace/art/estimators/speech_recognition/pytorch_deep_speech.py", line 364, in predict                                                                                   
    inputs, _, input_rates, _, batch_idx = self._transform_model_input(x=x_preprocessed)                                                                                             
  File "/workspace/art/estimators/speech_recognition/pytorch_deep_speech.py", line 795, in _transform_model_input                                                                    
    spectrogram, _ = torchaudio.functional.magphase(transformed_input)                                                                                                               
  File "/opt/conda/lib/python3.8/site-packages/torchaudio/_internal/module_utils.py", line 58, in wrapped                                                                            
    return func(*args, **kwargs)                                                                                                                                                     
  File "/opt/conda/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 795, in magphase                                                                           
    phase = angle(complex_tensor)                                                                                                                                                    
  File "/opt/conda/lib/python3.8/site-packages/torchaudio/_internal/module_utils.py", line 58, in wrapped                                                                            
    return func(*args, **kwargs)                                                                                                                                                     
  File "/opt/conda/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 771, in angle                                                                              
    return torch.atan2(complex_tensor[..., 1], complex_tensor[..., 0])                                                                                                               
RuntimeError: "atan2_cuda" not implemented for 'ComplexFloat'

as well as the warning before that of the deprecation of torchaudio.functional.magphase:

/workspace/art/estimators/speech_recognition/pytorch_deep_speech.py:795: UserWarning: torchaudio.functional.functional.magphase has been deprecated and will be removed from 0.11 rel
ease. Please convert the input Tensor to complex type with `torch.view_as_complex` then use `torch.abs` and `torch.angle`. Please refer to https://github.com/pytorch/audio/issues/13
37 for more details about torchaudio's plan to migrate to native complex type.                                                                                                       
  spectrogram, _ = torchaudio.functional.magphase(transformed_input)

To Reproduce

# First, install github.com/SeanNaren/deepspeech.pytorch@V3.0
from art.estimators.speech_recognition import PyTorchDeepSpeech
import numpy as np
model = PyTorchDeepSpeech(pretrained_model="librispeech")
assert model._version == 3
x = np.random.random((1, 238400)).astype(np.float32)
model.predict(x)

Expected behavior Should just generate the magnitude representation of the complex input.

System information (please complete the following information):

OS - Ubuntu 20.04
Python version - 3.8.10
ART version or commit number - 1.9.1
TensorFlow / Keras / PyTorch / MXNet version - torch is 1.10.2, torchaudio is 1.10.2

Trusted-AI / adversarial-robustness-toolbox

Deep Speech 2 Version 3.0 fails on PyTorch 1.10 / TorchAudio 0.10 #1549