Open dyustc opened 1 year ago
And there is a detailed problem, according to the equation
there is the code writing like this, A is covariance matrix of speech, B is of Noise, this is just the other way around of the equation, as in the paper. I suppose the code is wrong here?
I trained it, with our own dataset, I found maybe the linear matrix equation to solve the MVDR weights, it could sometimes be singular, and I have to deal with it.
Sometimes I add small epsilon around the context, sometimes I have to use torch.linalg.pinv to get a pseudo inverse of a singular matrix, but in this case, the loss isnot good.
So I wonder if you had this problem before and how you solved it.
cheers, Yi
And there is a detailed problem, according to the equation
there is the code writing like this, A is covariance matrix of speech, B is of Noise, this is just the other way around of the equation, as in the paper. I suppose the code is wrong here?
I guess the code is right, I tested the other way around, the loss would not decrease.
Hi, @Enny1991 Thanks for your advice last time. And I am reading the paper and codes, I have mainly 4 questions by now.
I find only 1 mic (arbitrarily chosen in the mics, in the code is always mic 0) is taken as the training input for mask output. Then the mask is used to do the weights, psd calculation, as you said, all the differentiate operations, but no other neural layers involved anymore. Is this correct? This is for computation purpose or it is found to be also performing good?
what the training target is, I suppose the clean speech with a little early reverberation? or a pure clean speech?
I saw there is some version tag in the code, which combination is of best use? For the STFT version, is it the v5 causal one ?
The model is based on TCN architecture, right? I am not that familiar with this arch. Maybe I have to catch up first.
Not sure if I asked the questions correctly. But looking forward to your reply.
Thanks