Open pongthang opened 1 month ago
Hello! Thank you for submitting issue. I see the problem is in features, especially in torchaudio
package:
from user code:
File "/home/miko/.cache/torch/hub/IDRnD_ReDimNet_master/redimnet.py", line 953, in forward
x = self.spec(x).unsqueeze(1)
File "/home/miko/miniconda3/envs/speaker_recog/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/miko/.cache/torch/hub/IDRnD_ReDimNet_master/redimnet.py", line 116, in forward
x = self.torchfbank(x)+1e-6
File "/home/miko/miniconda3/envs/speaker_recog/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/miko/miniconda3/envs/speaker_recog/lib/python3.10/site-packages/torchaudio/transforms/_transforms.py", line 619, in forward
specgram = self.spectrogram(waveform)
File "/home/miko/miniconda3/envs/speaker_recog/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/miko/miniconda3/envs/speaker_recog/lib/python3.10/site-packages/torchaudio/transforms/_transforms.py", line 110, in forward
return F.spectrogram(
File "/home/miko/miniconda3/envs/speaker_recog/lib/python3.10/site-packages/torchaudio/functional/functional.py", line 119, in spectrogram
frame_length_norm, window_norm = _get_spec_norms(normalized)
File "/home/miko/miniconda3/envs/speaker_recog/lib/python3.10/site-packages/torchaudio/functional/functional.py", line 233, in _get_spec_norms
if torch.jit.isinstance(normalized, str):
We are going to release soon more accurate models pretrained on voxblink2 + cnceleb + vox2 and we'll finetune them with different features, that are based on conv1d operations and that should be convertable to onnx.
Thank you for your reply. "that are based on conv1d operations and that should be convertable to onnx" , will it be similar to https://github.com/adobe-research/convmelspec , "Convmelspec: Convertible Melspectrograms via 1D Convolutions"
Yes, it will be similar in a way that both solutions are using convolution of signal and discrete fourier transform kernels, but implementations will differ.
Yes, it will be similar in a way that both solutions are using convolution of signal and discrete fourier transform kernels, but implementations will differ.
Hi, I try to find a solution for this while you are developing the custom spectrogram implementation. I follow one github issue and I can export the redimnet model to onnx successfully . But I am not sure how it affects the model performance. Here is the modification: inside - torchaudio/functional/functional.py --> def spectrogram() line - 126
.....
# default values are consistent with librosa.core.spectrum._spectrogram
spec_f = torch.stft(
input=waveform,
n_fft=n_fft,
hop_length=hop_length,
win_length=win_length,
window=window,
center=center,
pad_mode=pad_mode,
normalized=frame_length_norm,
onesided=onesided,
# return_complex=True,
return_complex=False, #new changes
)
# From imaginary and real values to absolute value
spec_f = torch.sqrt(torch.pow(spec_f[:, :, :, 0], 2.0) + torch.pow(spec_f[:, :, :, 1], 2.0)) # adding this newline
# unpack batch
...
Code use for exporting:
model.eval()
with torch.amp.autocast("cuda", enabled=False):
with torch.no_grad():
torch.onnx.export(
model,
input_sample,
"model_success_redimnet.onnx",
)
Is this good enough or will this affect the model performance?
@pongthang thanks for looking into it, I'm sorry, but we currently don't have resources and time for evaluation of the method proposed by you. It should be pretty simple to check model performance after conversion.
Happy to share good news, we have released first models pretrained on voxblink2
, you could find them on evaluation page. The first example there is with the best voxblink2 finetuned model. This models have the same features, but we are going to update audio features to convertable with next bunch of models.
Yes, it will be similar in a way that both solutions are using convolution of signal and discrete fourier transform kernels, but implementations will differ.是的,这两种解决方案都使用信号卷积和离散傅立叶变换内核,这在某种程度上是相似的,但实现方式会有所不同。
Hi, I try to find a solution for this while you are developing the custom spectrogram implementation. I follow one github issue and I can export the redimnet model to onnx successfully . But I am not sure how it affects the model performance.您好,在您开发自定义频谱图实现时,我尝试为此找到解决方案。我关注了一个github问题,我可以成功将redimnet模型导出到onnx。但我不确定它如何影响模型性能。 Here is the modification:这是修改: inside - torchaudio/functional/functional.py --> def spectrogram()里面 - torchaudio/features/functions.py --> def 频谱图() line - 126 线路 - 126
..... # default values are consistent with librosa.core.spectrum._spectrogram spec_f = torch.stft( input=waveform, n_fft=n_fft, hop_length=hop_length, win_length=win_length, window=window, center=center, pad_mode=pad_mode, normalized=frame_length_norm, onesided=onesided, # return_complex=True, return_complex=False, #new changes ) # From imaginary and real values to absolute value spec_f = torch.sqrt(torch.pow(spec_f[:, :, :, 0], 2.0) + torch.pow(spec_f[:, :, :, 1], 2.0)) # adding this newline # unpack batch ...
Code use for exporting: 导出时使用的代码:
model.eval() with torch.amp.autocast("cuda", enabled=False): with torch.no_grad(): torch.onnx.export( model, input_sample, "model_success_redimnet.onnx", )
Is this good enough or will this affect the model performance?这足够好还是会影响模型性能?
@pongthang Hi, it’s great to hear that ONNX can be exported. How is the performance and speed of ONNX?
@pongthang Hi, it’s great to hear that ONNX can be exported. How is the performance and speed of ONNX?
Hi, performance is good , and same as pytorch . Speed is also similar.
I try to export ReDimNet model from pytorch to ONNX. Please help me this out. The code I use is :
The Error is :