I get the following stack trace on both GPU and CPU:
File "/workspace/armory/scenarios/scenario.py", line 267, in run_benign
y_pred = self.model.predict(x, **self.predict_kwargs)
File "/workspace/art/estimators/speech_recognition/pytorch_deep_speech.py", line 364, in predict
inputs, _, input_rates, _, batch_idx = self._transform_model_input(x=x_preprocessed)
File "/workspace/art/estimators/speech_recognition/pytorch_deep_speech.py", line 795, in _transform_model_input
spectrogram, _ = torchaudio.functional.magphase(transformed_input)
File "/opt/conda/lib/python3.8/site-packages/torchaudio/_internal/module_utils.py", line 58, in wrapped
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 795, in magphase
phase = angle(complex_tensor)
File "/opt/conda/lib/python3.8/site-packages/torchaudio/_internal/module_utils.py", line 58, in wrapped
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 771, in angle
return torch.atan2(complex_tensor[..., 1], complex_tensor[..., 0])
RuntimeError: "atan2_cuda" not implemented for 'ComplexFloat'
as well as the warning before that of the deprecation of torchaudio.functional.magphase:
/workspace/art/estimators/speech_recognition/pytorch_deep_speech.py:795: UserWarning: torchaudio.functional.functional.magphase has been deprecated and will be removed from 0.11 rel
ease. Please convert the input Tensor to complex type with `torch.view_as_complex` then use `torch.abs` and `torch.angle`. Please refer to https://github.com/pytorch/audio/issues/13
37 for more details about torchaudio's plan to migrate to native complex type.
spectrogram, _ = torchaudio.functional.magphase(transformed_input)
To Reproduce
# First, install github.com/SeanNaren/deepspeech.pytorch@V3.0
from art.estimators.speech_recognition import PyTorchDeepSpeech
import numpy as np
model = PyTorchDeepSpeech(pretrained_model="librispeech")
assert model._version == 3
x = np.random.random((1, 238400)).astype(np.float32)
model.predict(x)
Expected behavior
Should just generate the magnitude representation of the complex input.
System information (please complete the following information):
OS - Ubuntu 20.04
Python version - 3.8.10
ART version or commit number - 1.9.1
TensorFlow / Keras / PyTorch / MXNet version - torch is 1.10.2, torchaudio is 1.10.2
Describe the bug Deep Speech 2 model fails when using V3.0, at least for torch 1.10 / torchaudio 1.11.
It fails at this line: https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/art/estimators/speech_recognition/pytorch_deep_speech.py#L791
I get the following stack trace on both GPU and CPU:
as well as the warning before that of the deprecation of
torchaudio.functional.magphase
:To Reproduce
Expected behavior Should just generate the magnitude representation of the complex input.
System information (please complete the following information):