google-deepmind / dm_aux

Apache License 2.0
61 stars 6 forks source link

Further development | Advice on audio processing #1

Open niemiaszek opened 1 year ago

niemiaszek commented 1 year ago

Is there any further development planned? I find it interesting to have a reasonable audio augmentations / features generation library accelerated with JAX.

The most common library, audiomentations, runs purely on CPU, where it has sister pytorch GPU-boosted library, torch-audiomentations. Torchaudio also has quite a great support for CPU/GPU with many ops included.

For Tensorflow I could only find tfio.audio with minimal usefulness (mespectrogram generation + specaugment and trim, not too many wave augmentations).

In my use case I would like to waveform augment dataset in offline fashion and dump it into tfrecords as melspectrograms. Then I would load them into Tensorflow which I currently use in my project and apply there melspectrogram augmentations on-fly. At this moment I could use audiomentations with some multiprocessing and do all waveform augments on CPU, and then use tfrecords and tf.io while training with Tensorflow.

Could you share how you approach such problems? Is there any widely supported audio-processing library for jax/tensorflow ecosystem?

svarunid commented 7 months ago

I was facing the same difficulty when I started to look for decent audio processing libraries. I guess a simple rewrite of audiomentation library in jax would suffice for now. If interested, we can join a build one!