Open JuanFMontesinos opened 8 months ago
Hi Juan,
Four sound source localization/tracking, I think cropping the signals is good enough, especially when training casual systems as icoDOA. Maybe this would be different in cases where it could lead to losing part of the desired information (as in speech recognition) or the reverberant tail is especially important (as in reverberation estimation).
In the case of casual systems, I'm pretty sure of this because the part of the signal you're cropping wouldn't affect the outputs you're obtaining. In the case of non-casual systems, I wouldn't expect to have a big impact but this is just an intuition since I haven't done any experiments to evaluate this.
Best, David
Hi again, after long time :)
I had a quick question I hope you could help me with. When simulating signals, the final length (duration) of the simulated signals is larger than the dry (source) signals. This makes sense as it's probably reverberation still bouncing around for a while. At the time of training with batches, signals should be padded as all the acoustic signals should be the same length to be stacked. It turns out I've been always training with batch size 1 and gradient accumulation and I've never noticed in depth this.
I was considering two options, 1) the complex approach of padding and keeping track of the padded length to mask the loss and so on. This would also require padding the trajectory signals. a) Do you have any experience on how the model would react to padding? b) Padding with zeros feels pretty bad. Padding the trajctories with last seems ok, but not for the audio. 2) Cropping the extra length so that they all are the same length. c) Is this bad in terms of modelling?
What is the best in your experience?
Thanks, Juan