Open Blinorot opened 1 year ago
Hi @Blinorot, as per my understanding:
Since the calculation of f
is only to get the fixed and predetermined fmelmax
and fmelmin
, as you mentioned, it is correct, but indeed redundant to define f
for such a purpose.
The two highest frequencies bands are not included.
Sinc layer in this work is slightly different from the original SincNet (here) where the code has low_hz
and high_hz
standing for lowest and highest frequencies of the mel scale (librosa documentation here). So the default mel scale obtained in this repo has a lowest frequency of 0, and a highest frequency of whatever filbandwidthsf
has at self.out_channels
, rather than self.out_channels+2
.
So the default mel scale obtained in this repo has a lowest frequency of 0, and a highest frequency of whatever
filbandwidthsf
has atself.out_channels
, rather thanself.out_channels+2
.
Hello, thank you, I understand this. However, the question is why do we choose the highest frequency like that? I think it is important question because:
1) In ASVspoof 2021 baseline version of RawNet2 the highest frequency is always sr/2
.
2) It is not obvious why the highest frequency should depend on the value of self.out_channels
.
3) If we change the highest frequency to sr/2
by fixing this +2
, the linear-scale and inverse mel-scale versions of RawNet2 in this repository start to overfit and do not produce the results depicted in the paper. If we do not remove this +2
, model works well. So this +2
is important, but I did not find anything about this in the paper.
Dear author,
I was trying to understand how the sinc-layer in your code works. Could you, please, explain two lines in this part:
1) Why do we need
f=int(self.sample_rate/2)*np.linspace(0,1,int(NFFT/2)+1)
? It seems thatfmelmax
is always equal toself.to_mel(int(self.sample_rate/2))
andfmelmin
is always equal toself.to_mel(0)
. We just do not use the fact thatf
is a linspace.2) Why do we need
+2
infilbandwidthsmel=np.linspace(fmelmin,fmelmax,self.out_channels+2)
? It seems thatself.freq=filbandwidthsf[:self.out_channels]
does not include the two highest frequencies because of this line. I could not find a note about that in the paper.Thank you.