Open aaronhsueh0506 opened 1 year ago
Hi,
I found the result is good while using your website. Because I re-train the model by Keras, and Keras do not support grouped Conv2DTranspose layer. I will try to figure out the difference between Keras and Torch.
Best regards, Aaron
Hi,
I am checking the model inputs and found some differences. I can use numpy.rfft, vorbis window, and stft_norm get the same value with stft function.
stft_norm = 1 / (n_fft ** 2 / (2 * hop))
spec = torch.stft(
audio, n_fft=n_fft, hop_length=hop, window=torch.Tensor(vorbis_window(n_fft)),
return_complex=True, normalized=False, center=False
).transpose(1, 2)
But I found when I send the same signal to df.analysis or df_features in enhance.py, I get different spec with this stft function. Is there any different?
Another question, is dB rescale important for ERB?
Thanks,
Code looks good, not sure where you get some differences. dB scaling is important since the raw amplitude does not correlate well with human loudness perception and is thus not a good feature.
Hi,
I try to use this command in enhance.py.
spec, erb_feat, spec_feat = df_features(audio, df_state, device=get_device())
and save spec
as a npy files.
Also, use
spec = torch.stft(
audio, n_fft=n_fft, hop_length=hop, window=torch.Tensor(vorbis_window(n_fft)),
return_complex=True, normalized=False, center=False
).transpose(1, 2) * stft_norm
But these two functions get different values of spec
.
Hi Rikorose,
I'm trying to fine tune some effects, do you have any suggestions for these points?
Thanks, Aaron