Implement time stretch transform

asteroid-team / torch-audiomentations

Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.

MIT License

969 stars 88 forks source link

Implement time stretch transform #68

Open iver56 opened 3 years ago

KentoNishi commented 3 years ago

this sounds like something that wouldn't be too bad to work on, since pitch shifting is half time transform anyway!

iver56 commented 3 years ago

Agreed. A contribution would be welcome :)

iver56 commented 3 years ago

https://github.com/KentoNishi/torch-time-stretch

KentoNishi commented 3 years ago

I might do this when I'm free! I'm a little (severe understatement) short on time at the moment but it shouldn't be too bad to implement (famous last words)

iver56 commented 3 years ago

That's cool :) Btw, here's a related idea: A transform that is a combination of time stretching and pitch shifting, but does it in one operation, so it gets roughly the same execution time as time stretching

KentoNishi commented 3 years ago

wow i would not have thought of that one myself, that's quite genius 🧠

iver56 commented 3 years ago

I'm glad you liked my idea ^^ Should I create an issue for that, then?

KentoNishi commented 3 years ago

Yep!

roses are red pitch-shift was merged to HEAD time-stretch separately? why not both instead?

iver56 commented 3 years ago

The idea is ambitious, new and deep, But we also have other promises to keep. The issue number is #101, Let's hope it gets picked up by someone. Too bad that we are limited on time, but at least we have time for a rhyme

KentoNishi commented 3 years ago

oh my god this is beautiful lmaooooooooooooooooooooooooooo i love it

akashrajkn commented 2 years ago

Hi @iver56! First of all, really nice that you are maintaining this project :) I work with audio AI models a lot and use audiomentations for many of them.

Since TimeStretch doesn't exist yet, following torch-time-stretch and audiomentations, I implemented a class for it. Since this transform changes the length of audio, this snippet of code from core.transforms_interface raises an error:

if self.mode == "per_example":
    if not self.are_parameters_frozen:
        self.randomize_parameters(selected_samples, sample_rate)

    cloned_samples[    # <--- 
        self.transform_parameters["should_apply"]
    ] = self.apply_transform(selected_samples, sample_rate)

Error: RuntimeError: shape mismatch: value tensor of shape [xxx] cannot be broadcast to indexing result of shape [1, 1, yyy]

How would you address this? Thank you in advance

iver56 commented 2 years ago

Hey :) I will check this out in a few days.

Nice profile pic btw ^^

akashrajkn commented 2 years ago

Thanks!

I've submitted a PR (it is not complete) so you can view the code