Open youyou098888 opened 2 months ago
You can try. But I think 200ms may not be a good choice for training/inference, as context is not enough for the neural network to learn/predict.
I notice that both training and testing utterances are 4seconds long and the inference is "the evaluation utterances are first chunked to 4-second segments and processed by the network, with 2-second overlapping between consecutive segments."
Note: this configuration is for SpatialNet, not for online SpatialNet
I notice that both training and testing utterances are 4seconds long and the inference is "the evaluation utterances are first chunked to 4-second segments and processed by the network, with 2-second overlapping between consecutive segments."
If I want to shink the input of the network, Is there any chance I can use them in shorter audio, say 200ms long? Can I use 4-seconds for training and 200ms for inference? If not, Can I use 200ms for training and 200ms for inference?