facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation
Other
10.8k stars 1.05k forks source link

seamless-streaming inference error #465

Closed LesterGong closed 3 months ago

LesterGong commented 3 months ago

I follow the Streaming Standalone Inference section in Seamless_Tutorial, and error occurs.

/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
building system from dir
Using the cached tokenizer of seamless_streaming_unity. Set `force` to `True` to download again.
2024-06-06 21:26:19,094 INFO -- seamless_communication.streaming.agents.unity_pipeline: Loading the UnitY model: seamless_streaming_unity on device=cuda:1, dtype=torch.float16
Using the cached checkpoint of seamless_streaming_unity. Set `force` to `True` to download again.
Using the cached tokenizer of seamlessM4T_v2_large. Set `force` to `True` to download again.
Using the cached tokenizer of seamlessM4T_v2_large. Set `force` to `True` to download again.
2024-06-06 21:26:30,545 INFO -- seamless_communication.streaming.agents.unity_pipeline: Loading the Monotonic Decoder model: seamless_streaming_monotonic_decoder on device=cuda:1, dtype=torch.float16
Using the cached checkpoint of seamless_streaming_monotonic_decoder. Set `force` to `True` to download again.
Using cache found in /home/dzr/.cache/torch/hub/snakers4_silero-vad_master
2024-06-06 21:26:39,310 INFO -- seamless_communication.streaming.agents.online_vocoder: Loading the Vocoder model: vocoder_v2 on device=cuda:1, dtype=torch.float16
Using the cached checkpoint of vocoder_v2. Set `force` to `True` to download again.
finished building system
Using cache found in /home/dzr/.cache/torch/hub/snakers4_silero-vad_master
/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py:1194: UserWarning: operator() profile_node %669 : int[] = prim::profile_ivalue(%667)
 does not have profile information (Triggered internally at ../torch/csrc/jit/codegen/cuda/graph_fuser.cpp:105.)
  return forward_call(*input, **kwargs)
Traceback (most recent call last):
  File "/home/dzr/gls/speech/test.py", line 214, in <module>
    delays, prediction_lists, speech_durations, target_sample_rate = run_streaming_inference(
  File "/home/dzr/gls/speech/test.py", line 131, in run_streaming_inference
    output_segments = OutputSegments(system.pushpop(input_segment, system_states))
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/simuleval/agents/pipeline.py", line 304, in pushpop
    self.push(segment, states, upstream_states)
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/simuleval/agents/pipeline.py", line 321, in push
    self.push_impl(self.source_module, segment, states, upstream_states)
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/simuleval/agents/pipeline.py", line 296, in push_impl
    self.push_impl(child, segment, states, upstream_states)
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/simuleval/agents/pipeline.py", line 296, in push_impl
    self.push_impl(child, segment, states, upstream_states)
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/simuleval/agents/pipeline.py", line 296, in push_impl
    self.push_impl(child, segment, states, upstream_states)
  [Previous line repeated 1 more time]
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/simuleval/agents/pipeline.py", line 287, in push_impl
    segment = module.pushpop(segment, states[module], upstream_states)
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/simuleval/agents/agent.py", line 170, in pushpop
    return self.pop(states)
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/simuleval/agents/agent.py", line 134, in pop
    action = self.policy(states)
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/streaming/agents/online_unit_decoder.py", line 105, in policy
    model_output, _, durations = self.model(
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/unity/model.py", line 394, in forward
    decoder_output, decoder_padding_mask, durations = self.decode(
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/unity/model.py", line 424, in decode
    seqs, padding_mask, durations = self.decoder_frontend(
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/unity/nar_decoder_frontend.py", line 324, in forward
    seqs, padding_mask, durations = self.variance_adaptor(
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/unity/length_regulator.py", line 285, in forward
    log_durations = self.duration_predictor(seqs, padding_mask, film_cond_emb)
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/models/unity/length_regulator.py", line 179, in forward
    seqs = apply_padding_mask(seqs, padding_mask)
  File "/home/dzr/anaconda3/envs/seamless/lib/python3.10/site-packages/fairseq2/nn/padding.py", line 116, in apply_padding_mask
    return seqs.where(m, pad_value)
TypeError: where(): argument 'other' (position 2) must be Tensor, not int