I follow the Streaming Standalone Inference section in Seamless_Tutorial, and error occurs.
/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
building system from dir
Using the cached tokenizer of seamless_streaming_unity. Set `force` to `True` to download again.
2024-06-06 21:26:19,094 INFO -- seamless_communication.streaming.agents.unity_pipeline: Loading the UnitY model: seamless_streaming_unity on device=cuda:1, dtype=torch.float16
Using the cached checkpoint of seamless_streaming_unity. Set `force` to `True` to download again.
Using the cached tokenizer of seamlessM4T_v2_large. Set `force` to `True` to download again.
Using the cached tokenizer of seamlessM4T_v2_large. Set `force` to `True` to download again.
2024-06-06 21:26:30,545 INFO -- seamless_communication.streaming.agents.unity_pipeline: Loading the Monotonic Decoder model: seamless_streaming_monotonic_decoder on device=cuda:1, dtype=torch.float16
Using the cached checkpoint of seamless_streaming_monotonic_decoder. Set `force` to `True` to download again.
Using cache found in /home/dzr/.cache/torch/hub/snakers4_silero-vad_master
2024-06-06 21:26:39,310 INFO -- seamless_communication.streaming.agents.online_vocoder: Loading the Vocoder model: vocoder_v2 on device=cuda:1, dtype=torch.float16
Using the cached checkpoint of vocoder_v2. Set `force` to `True` to download again.
finished building system
Using cache found in /home/dzr/.cache/torch/hub/snakers4_silero-vad_master
/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py:1194: UserWarning: operator() profile_node %669 : int[] = prim::profile_ivalue(%667)
does not have profile information (Triggered internally at ../torch/csrc/jit/codegen/cuda/graph_fuser.cpp:105.)
return forward_call(*input, **kwargs)
Traceback (most recent call last):
File "/home/dzr/gls/speech/test.py", line 214, in <module>
delays, prediction_lists, speech_durations, target_sample_rate = run_streaming_inference(
File "/home/dzr/gls/speech/test.py", line 131, in run_streaming_inference
output_segments = OutputSegments(system.pushpop(input_segment, system_states))
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/simuleval/agents/pipeline.py", line 304, in pushpop
self.push(segment, states, upstream_states)
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/simuleval/agents/pipeline.py", line 321, in push
self.push_impl(self.source_module, segment, states, upstream_states)
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/simuleval/agents/pipeline.py", line 296, in push_impl
self.push_impl(child, segment, states, upstream_states)
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/simuleval/agents/pipeline.py", line 296, in push_impl
self.push_impl(child, segment, states, upstream_states)
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/simuleval/agents/pipeline.py", line 296, in push_impl
self.push_impl(child, segment, states, upstream_states)
[Previous line repeated 1 more time]
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/simuleval/agents/pipeline.py", line 287, in push_impl
segment = module.pushpop(segment, states[module], upstream_states)
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/simuleval/agents/agent.py", line 170, in pushpop
return self.pop(states)
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/simuleval/agents/agent.py", line 134, in pop
action = self.policy(states)
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/streaming/agents/online_unit_decoder.py", line 105, in policy
model_output, _, durations = self.model(
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/unity/model.py", line 394, in forward
decoder_output, decoder_padding_mask, durations = self.decode(
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/unity/model.py", line 424, in decode
seqs, padding_mask, durations = self.decoder_frontend(
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/unity/nar_decoder_frontend.py", line 324, in forward
seqs, padding_mask, durations = self.variance_adaptor(
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/unity/length_regulator.py", line 285, in forward
log_durations = self.duration_predictor(seqs, padding_mask, film_cond_emb)
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/unity/length_regulator.py", line 179, in forward
seqs = apply_padding_mask(seqs, padding_mask)
File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/fairseq2/nn/padding.py", line 116, in apply_padding_mask
return seqs.where(m, pad_value)
TypeError: where(): argument 'other' (position 2) must be Tensor, not int
I follow the Streaming Standalone Inference section in Seamless_Tutorial, and error occurs.