cache background_noise rms data

asteroid-team / torch-audiomentations

Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.

MIT License

928 stars 87 forks source link

cache background_noise rms data #145

Open fantasyRqg opened 2 years ago

fantasyRqg commented 2 years ago

Boost background_noise performance.

Reduce audio decode and file io
Reduce rms compute. maybe a diffrenece between rms(partial audio) and rms(full audio)

iver56 commented 2 years ago

Hi fantasyRgg, and thanks for your PR 😃

Just for context, so I understand the problem you're proposing to solve, I want to ask some questions:

How large is your background noise dataset?
If you are training a model, how many workers do you use for preparing the audio examples that go into the training batches?
How much memory (RAM) is there on the computer where you are doing the training?
What audio file format are your background noise files? And do they have the same sample rate as the "clean" input audios that the noises get added to?
Are you using an SSD or a HDD?

Ideally, a good solution would work well in all kinds of combinations of answers to those questions

fantasyRqg commented 2 years ago

How large is your background noise dataset?

About 2k records
If you are training a model, how many workers do you use for preparing the audio examples that go into the training batches?

Only one worker, I tried multi worker, not fast enough.
How much memory (RAM) is there on the computer where you are doing the training?

I cached samples and noises. samples took 7GB, noiese took 1.5GB
What audio file format are your background noise files? And do they have the same sample rate as the "clean" input audios that the noises get added to?

I don't think audio format and sample rate is problem. audio: Audio paramter will take care of all problem.
Are you using an SSD or a HDD?

HDD

iver56 commented 2 years ago

Thanks for the insight :) Indeed, in your case it makes sense to apply caching like this.

[x] HDD
[x] Not very large dataset - fits in RAM
[x] Single worker

My own use case is quite different, and would actually be best without caching:

[x] SSD
[x] Very large dataset, cannot fit in RAM
[x] Many workers

I don't think audio format and sample rate is problem. audio: Audio paramter will take care of all problem.

The reason why I asked is that resampling (in case of mismatch) may take a significant amount of CPU time, slowing down the model training.

I'm currently wrapping up the 0.11 release, and then I'll have some work preparing a few new transforms, and then after that I'll hopefully have more time to consider this caching feature. In the meantime, thanks for your patience, and I hope you're okay with using your own fork for now