Can DALI make the length of each sample uniform while decoding the audio file?

NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html

Apache License 2.0

5.15k stars 622 forks source link

Can DALI make the length of each sample uniform while decoding the audio file? #4372

Open byungsoo-oh opened 2 years ago

byungsoo-oh commented 2 years ago

Hi, I would like to decode audio files (wav format) while making each sample have the uniform length.

Specifically, I am trying to implement exactly the same operation as tf.audio.decode_wav() in DALI pipeline.

While desired_samples parameter in tf.audio.decode_wav() enables audio to be cropped or padded to the requested length, it seems that nvidia.dali.fn.decoders.audio does not provide such option.

I tried to find a DALI operator that crops audio file to desired length, but I could not find one from operation reference in DALI docs.

Is there any way I can perform the desired task using DALI?

Thanks a lot.

JanuszL commented 2 years ago

Hi @byungsoo-oh,

You can just use the slice operator, but you decode the sample fully first and then trim it (or pad using pad operator). Still, it sounds like a good enhancement idea. Can you tell me about your particular use case?

byungsoo-oh commented 2 years ago

Hi @JanuszL, thank you for the kind reply! I solved the issue with slice operator as you suggested :) I have been trying to implement DALI pipeline of MelGAN with LJSpeech dataset (originally written in TensorFlow [code]). It decodes audio data while making the length of the samples uniform, which then is mapped to a spectrogram. You can refer to the details in the above code link. Thanks a lot for the help!

JanuszL commented 2 years ago

@byungsoo-oh thanks! We will look into it.