Closed zhaoyun-ai closed 3 years ago
The tensor has a shape like (batch_size, num_channels, num_samples)
Batched computation allows one to apply audio augmentation to multiple audio recordings in one pass, and results in faster execution due to parallelism.
Multiple channels are allowed for doing e.g. stereo or more instead of just mono.
If you have a tensor that represents 4 stereo audio snippets of 2 seconds each at 16000 hz, the shape of the tensor would be (4, 2, 32000)
Thank you for your early reply!I will close this question.
I want to know what it means when I get 3-dimensional tensor from the transform.What does each dimension mean