Can I use shorter training and testing utterances?

Audio-WestlakeU / NBSS

The official repo of NBC & SpatialNet for multichannel speech separation, denoising, and dereverberation

MIT License

175 stars 21 forks source link

Can I use shorter training and testing utterances? #27

Open youyou098888 opened 2 months ago

youyou098888 commented 2 months ago

I notice that both training and testing utterances are 4seconds long and the inference is "the evaluation utterances are first chunked to 4-second segments and processed by the network, with 2-second overlapping between consecutive segments."

If I want to shink the input of the network, Is there any chance I can use them in shorter audio, say 200ms long? Can I use 4-seconds for training and 200ms for inference? If not, Can I use 200ms for training and 200ms for inference?

quancs commented 2 months ago

You can try. But I think 200ms may not be a good choice for training/inference, as context is not enough for the neural network to learn/predict.

I notice that both training and testing utterances are 4seconds long and the inference is "the evaluation utterances are first chunked to 4-second segments and processed by the network, with 2-second overlapping between consecutive segments."

Note: this configuration is for SpatialNet, not for online SpatialNet