Enny1991 / RealMuD

1 stars 1 forks source link

Only 1 mic for the Conv layers of the model? #2

Open dyustc opened 1 year ago

dyustc commented 1 year ago

Hi, @Enny1991 Thanks for your advice last time. And I am reading the paper and codes, I have mainly 4 questions by now.

  1. I find only 1 mic (arbitrarily chosen in the mics, in the code is always mic 0) is taken as the training input for mask output. Then the mask is used to do the weights, psd calculation, as you said, all the differentiate operations, but no other neural layers involved anymore. Is this correct? This is for computation purpose or it is found to be also performing good?

  2. what the training target is, I suppose the clean speech with a little early reverberation? or a pure clean speech?

  3. I saw there is some version tag in the code, which combination is of best use? For the STFT version, is it the v5 causal one ?

  4. The model is based on TCN architecture, right? I am not that familiar with this arch. Maybe I have to catch up first.

Not sure if I asked the questions correctly. But looking forward to your reply.

Thanks

dyustc commented 1 year ago

And there is a detailed problem, according to the equation

截屏2023-07-18 17 05 27

there is the code writing like this, A is covariance matrix of speech, B is of Noise, this is just the other way around of the equation, as in the paper. I suppose the code is wrong here? 截屏2023-07-18 17 07 05

dyustc commented 1 year ago

I trained it, with our own dataset, I found maybe the linear matrix equation to solve the MVDR weights, it could sometimes be singular, and I have to deal with it.

Sometimes I add small epsilon around the context, sometimes I have to use torch.linalg.pinv to get a pseudo inverse of a singular matrix, but in this case, the loss isnot good.

So I wonder if you had this problem before and how you solved it.

cheers, Yi

dyustc commented 1 year ago

And there is a detailed problem, according to the equation 截屏2023-07-18 17 05 27

there is the code writing like this, A is covariance matrix of speech, B is of Noise, this is just the other way around of the equation, as in the paper. I suppose the code is wrong here? 截屏2023-07-18 17 07 05

I guess the code is right, I tested the other way around, the loss would not decrease.