castorini / howl

Wake word detection modeling toolkit for Firefox Voice, supporting open datasets like Speech Commands and Common Voice.
Mozilla Public License 2.0
194 stars 28 forks source link

Pretrained model streaming runtime error. #112

Open adib-vali opened 2 years ago

adib-vali commented 2 years ago

I wanted to see a demo of the project using the pre-trained model. But this error occurred:

2022-04-13 20:36:43 WARNING setup_logger(30) Removing existing handlers from HowlClient logger 2022-04-13 20:36:43,874 INFO setup_logger(54) Set up logger (HowlClient), output path: None Using cache found in /home/adib/.cache/torch/hub/castorini_howl_master 2022-04-13 20:36:44.069002: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2022-04-13 20:36:44 INFO _init_num_threads(157) NumExpr defaulting to 4 threads. 2022-04-13 20:36:45 INFO init(97) target hey is assigned to label 0 2022-04-13 20:36:45 INFO init(97) target fire is assigned to label 1 2022-04-13 20:36:45 INFO init(97) target fox is assigned to label 2 2022-04-13 20:36:45 INFO init(97) target [OOV] is assigned to label 3 ALSA lib pcm_dsnoop.c:638:(snd_pcm_dsnoop_open) unable to open slave ALSA lib pcm_dmix.c:1075:(snd_pcm_dmix_open) unable to open slave ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card ALSA lib pcm_dmix.c:1075:(snd_pcm_dmix_open) unable to open slave 2022-04-13 20:36:45,478 INFO start(140) Starting Howl inference client... torch.Size([8000]) torch.Size([1, 40, 41]) Traceback (most recent call last): File "/home/adib/Projects/wake word detection/howl/howl/client/howl_client.py", line 95, in _on_audio if self.engine.infer(inp): File "/home/adib/anaconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/home/adib/Projects/wake word detection/howl/howl/model/inference.py", line 240, in infer self.ingest_frame(window.squeeze(0), self.curr_time) File "/home/adib/anaconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "/home/adib/Projects/wake word detection/howl/howl/model/inference.py", line 263, in ingest_frame transformed_frame = self.zmuv(self.std(frame.unsqueeze(0))) File "/home/adib/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/home/adib/Projects/wake word detection/howl/howl/data/transform/transform.py", line 77, in forward x = self.passthrough(x, kwargs) File "/home/adib/Projects/wake word detection/howl/howl/data/transform/transform.py", line 241, in passthrough return self._execute_op(self.spec_transform, audio, kwargs) File "/home/adib/anaconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/home/adib/Projects/wake word detection/howl/howl/data/transform/transform.py", line 229, in _execute_op if not deltas_only : logmels = op(audio).add(1e-7).log_().contiguous() File "/home/adib/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, *kwargs) File "/home/adib/anaconda3/lib/python3.8/site-packages/torchaudio/transforms.py", line 480, in forward specgram = self.spectrogram(waveform) File "/home/adib/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/home/adib/anaconda3/lib/python3.8/site-packages/torchaudio/transforms.py", line 96, in forward return F.spectrogram( File "/home/adib/anaconda3/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 91, in spectrogram spec_f = torch.stft( File "/home/adib/anaconda3/lib/python3.8/site-packages/torch/functional.py", line 578, in stft input = F.pad(input.view(extended_shape), [pad, pad], pad_mode) File "/home/adib/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 4006, in _pad return torch._C._nn.reflection_pad1d(input, pad) RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 120, 41] Traceback (most recent call last): File "test.py", line 9, in client.start().join() File "/home/adib/Projects/wake word detection/howl/howl/client/howl_client.py", line 148, in join time.sleep(0.04) RuntimeError

Do you know how can I solve it?

ljj7975 commented 2 years ago

what was the process you followed?