Cropping signals or padding

DavidDiazGuerra / icoDOA

Code repository for the paper Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs

GNU Affero General Public License v3.0

30 stars 9 forks source link

Hi again, after long time :)

I had a quick question I hope you could help me with. When simulating signals, the final length (duration) of the simulated signals is larger than the dry (source) signals. This makes sense as it's probably reverberation still bouncing around for a while. At the time of training with batches, signals should be padded as all the acoustic signals should be the same length to be stacked. It turns out I've been always training with batch size 1 and gradient accumulation and I've never noticed in depth this.

I was considering two options, 1) the complex approach of padding and keeping track of the padded length to mask the loss and so on. This would also require padding the trajectory signals. a) Do you have any experience on how the model would react to padding? b) Padding with zeros feels pretty bad. Padding the trajctories with last seems ok, but not for the audio. 2) Cropping the extra length so that they all are the same length. c) Is this bad in terms of modelling?

What is the best in your experience?

Thanks, Juan

DavidDiazGuerra / icoDOA

Cropping signals or padding #5