asteroid-team / torch-audiomentations

Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
MIT License
969 stars 88 forks source link

feat: update ApplyBackgroundNoise augmentation #48

Closed hbredin closed 4 years ago

hbredin commented 4 years ago

This is a work in progress but I'd love to receive early feedback anyway.

iver56 commented 4 years ago

I can take another look when it's out of draft :)

iver56 commented 4 years ago

There's a merge conflict, and this is the reason: https://github.com/asteroid-team/torch-audiomentations/pull/50/files#diff-6e77014151191ab9ff2d304e38e00227aacf5f13c96d97b95bb4ace36bf834c1

hbredin commented 4 years ago

This PR now fails because tests try to augment 2d samples. Shouldn't we first come up with a PR that switches to 3d only (as discussed last week in slack)?

iver56 commented 4 years ago

This PR now fails because tests try to augment 2d samples. Shouldn't we first come up with a PR that switches to 3d only (as discussed last week in slack)?

Yes, we should adapt (or remove) tests that currently provide 2d input. We can enforce/assert 3d input in a different pull request.

iver56 commented 4 years ago

I tried to use this transform in the demo script. I proposed a few changes in your branch here: https://github.com/hbredin/torch-audiomentations/pull/1

I'm currently getting an exception like this:

Traceback (most recent call last):
  File "C:/Users/Iver/Code/torch-audiomentations/scripts/demo.py", line 139, in <module>
    samples=samples, sample_rate=SAMPLE_RATE
  File "C:\Users\Iver\Anaconda3\envs\torch-audiomentations-gpu\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\Iver\Code\torch-audiomentations\torch_audiomentations\core\transforms_interface.py", line 189, in forward
    self.randomize_parameters(cloned_samples, sample_rate)
  File "C:\Users\Iver\Code\torch-audiomentations\torch_audiomentations\augmentations\background_noise.py", line 113, in randomize_parameters
    [self.random_background(audio, num_samples) for _ in range(batch_size)]
  File "C:\Users\Iver\Code\torch-audiomentations\torch_audiomentations\augmentations\background_noise.py", line 113, in <listcomp>
    [self.random_background(audio, num_samples) for _ in range(batch_size)]
  File "C:\Users\Iver\Code\torch-audiomentations\torch_audiomentations\augmentations\background_noise.py", line 80, in random_background
    0, background_num_samples - missing_num_samples
  File "C:\Users\Iver\Anaconda3\envs\torch-audiomentations-gpu\lib\random.py", line 222, in randint
    return self.randrange(a, b+1)
  File "C:\Users\Iver\Anaconda3\envs\torch-audiomentations-gpu\lib\random.py", line 195, in randrange
    raise ValueError("non-integer stop for randrange()")
ValueError: non-integer stop for randrange()

Edit: I think it's because get_num_samples sometimes doesn't return an int

iver56 commented 4 years ago

I tried to run the demo script, and I got an exception like this:

Traceback (most recent call last):
  File "C:/Users/Iver/Code/torch-audiomentations/scripts/demo.py", line 139, in <module>
    samples=samples, sample_rate=SAMPLE_RATE
  File "C:\Users\Iver\Anaconda3\envs\torch-audiomentations-gpu\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\Iver\Code\torch-audiomentations\torch_audiomentations\core\transforms_interface.py", line 189, in forward
    self.randomize_parameters(cloned_samples, sample_rate)
  File "C:\Users\Iver\Code\torch-audiomentations\torch_audiomentations\augmentations\background_noise.py", line 113, in randomize_parameters
    [self.random_background(audio, num_samples) for _ in range(batch_size)]
  File "C:\Users\Iver\Code\torch-audiomentations\torch_audiomentations\augmentations\background_noise.py", line 113, in <listcomp>
    [self.random_background(audio, num_samples) for _ in range(batch_size)]
  File "C:\Users\Iver\Code\torch-audiomentations\torch_audiomentations\augmentations\background_noise.py", line 84, in random_background
    background_path, sample_offset=sample_offset, num_samples=num_samples,
  File "C:\Users\Iver\Code\torch-audiomentations\torch_audiomentations\utils\io.py", line 240, in __call__
    raise ValueError()
ValueError

Somehow the numbers don't add up in this case, and I think it's related to signal and noise having different original sample rates

        # io.py line 239-240
        if original_sample_offset + original_num_samples > original_total_num_samples:
            raise ValueError()

Could you try to run the demo script and see if you can reproduce it?

python -m scripts.demo

iver56 commented 4 years ago

LGTM! 🚀 Thanks for the contribution 😄