asteroid-team / torch-audiomentations

Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
MIT License
969 stars 88 forks source link

RuntimeError: torchaudio::sox_io_get_info() Expected a value of type 'str' for argument '_0' but instead found type 'PosixPath'. #179

Closed yalishanda42 closed 3 weeks ago

yalishanda42 commented 3 weeks ago

Hello, I think there is an issue when using AddBackgroundNoise. Specifically, no matter whether I supply a list of Paths to files, or a list of strs to files, it gives me the same error.

I noticed it happends when using CUDA only. When on CPU I cannot reproduce this.

Logs:

Traceback (most recent call last):
    ... 
    <REDACTED - my code>
    ...
    perturbed_audio_samples = base_augmentations(X, sample_rate=sample_rate)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch_audiomentations/core/composition.py", line 120, in forward
    inputs = self.transforms[i](**inputs)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch_audiomentations/core/transforms_interface.py", line 334, in forward
    self.randomize_parameters(
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch_audiomentations/augmentations/background_noise.py", line 126, in randomize_parameters
    [self.random_background(audio, num_samples) for _ in range(batch_size)]
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch_audiomentations/augmentations/background_noise.py", line 126, in <listcomp>
    [self.random_background(audio, num_samples) for _ in range(batch_size)]
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch_audiomentations/augmentations/background_noise.py", line 85, in random_background
    background_num_samples = audio.get_num_samples(background_path)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch_audiomentations/utils/io.py", line 131, in get_num_samples
    num_samples, sample_rate = self.get_audio_metadata(file)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch_audiomentations/utils/io.py", line 97, in get_audio_metadata
    info = torchaudio.info(file_path)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torchaudio/_backend/utils.py", line 98, in info
    return backend.info(uri, format, buffer_size)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torchaudio/_backend/sox.py", line 20, in info
    sinfo = torch.ops.torchaudio.sox_io_get_info(uri, format)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch/_ops.py", line 692, in __call__
    return self._op(*args, **kwargs or {})

RuntimeError: torchaudio::sox_io_get_info() Expected a value of type 'str' for argument '_0' but instead found type 'PosixPath'.
Position: 0
Value: PosixPath('<REDACTED - absolute path>/18_11874_chunk_1.wav')
Declaration: torchaudio::sox_io_get_info(str _0, str? _1) -> (int _0, int _1, int _2, int _3, str _4)
Cast error details: Unable to cast Python instance of type <class 'pathlib.PosixPath'> to C++ type '?' (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details)

I checked out the code in the library, specifically find_audio_files_in_paths, and saw it converts and saves the file paths it finds/receives as Path objects always. But the error message says it expects them as str objects?

Would the issue be fixed if they are kept as str and would there be other issues arising from that? I wanted to first check with you, guys, before forking and trying out stuff. If it turns out the fix is that simple I can volunteer to implement it.

Versions: Pytorch == 2.1.2 CUDA == 12.1 torch-audiomentations == 0.11.1

yalishanda42 commented 3 weeks ago

A simpler fix might be to just convert to str only for the torchaudio.info(...) call. Going to try this out now.

iver56 commented 3 weeks ago

Thanks for reporting and trying to fix it. If you can't fix it, another option could be to use AddBackgroundNoise in audiomentations