FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model
https://funaudiollm.github.io/
Other
3.53k stars 318 forks source link

The demo.py can not work correctly #147

Open chongkuiqi opened 1 month ago

chongkuiqi commented 1 month ago

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

🐛 Bug

When I run the demo.py , the error is :

Traceback (most recent call last):
  File "/home/haige/ckq/arm-llm-dev/arm_crl/multimodal/sensevoice.py", line 18, in <module>
    res = model.generate(
  File "/home/haige/miniconda3/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 303, in generate
    return self.inference_with_vad(input, input_len=input_len, **cfg)
  File "/home/haige/miniconda3/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 376, in inference_with_vad
    res = self.inference(
  File "/home/haige/miniconda3/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 342, in inference
    res = model.inference(**batch, **kwargs)
  File "/home/haige/miniconda3/lib/python3.10/site-packages/funasr/models/fsmn_vad_streaming/model.py", line 690, in inference
    audio_sample = torch.cat((cache["prev_samples"], audio_sample_list[0]))
TypeError: expected Tensor as element 1 in argument 0, but got str

Code sample

from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model_dir = "/home/haige/ckq/arm-llm-dev/arm_crl/multimodal/SenseVoiceSmall"

model = AutoModel(
    model=model_dir,
    trust_remote_code=True,
    vad_model="fsmn-vad",
    vad_kwargs={"max_single_segment_time": 30000},
    device="cuda:0",
)

# en
res = model.generate(
    input=f"{model.model_path}/example/en.mp3",
    cache={},
    language="auto",  # "zh", "en", "yue", "ja", "ko", "nospeech"
    use_itn=True,
    batch_size_s=60,
    merge_vad=True,  #
    merge_length_s=15,
)
text = rich_transcription_postprocess(res[0]["text"])
print(text)

Expected behavior

Environment

chongkuiqi commented 1 month ago

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

🐛 Bug

When I run the demo.py , the error is :

Traceback (most recent call last):
  File "/home/haige/ckq/arm-llm-dev/arm_crl/multimodal/sensevoice.py", line 18, in <module>
    res = model.generate(
  File "/home/haige/miniconda3/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 303, in generate
    return self.inference_with_vad(input, input_len=input_len, **cfg)
  File "/home/haige/miniconda3/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 376, in inference_with_vad
    res = self.inference(
  File "/home/haige/miniconda3/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 342, in inference
    res = model.inference(**batch, **kwargs)
  File "/home/haige/miniconda3/lib/python3.10/site-packages/funasr/models/fsmn_vad_streaming/model.py", line 690, in inference
    audio_sample = torch.cat((cache["prev_samples"], audio_sample_list[0]))
TypeError: expected Tensor as element 1 in argument 0, but got str

Code sample

from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model_dir = "/home/haige/ckq/arm-llm-dev/arm_crl/multimodal/SenseVoiceSmall"

model = AutoModel(
    model=model_dir,
    trust_remote_code=True,
    vad_model="fsmn-vad",
    vad_kwargs={"max_single_segment_time": 30000},
    device="cuda:0",
)

# en
res = model.generate(
    input=f"{model.model_path}/example/en.mp3",
    cache={},
    language="auto",  # "zh", "en", "yue", "ja", "ko", "nospeech"
    use_itn=True,
    batch_size_s=60,
    merge_vad=True,  #
    merge_length_s=15,
)
text = rich_transcription_postprocess(res[0]["text"])
print(text)

Expected behavior

Environment

  • OS : Ubuntu 20.04
  • FunASR Version : 1.1.12
  • ModelScope Version : 1.15.0
  • PyTorch Version : 2.2.2+cu121
  • How you installed funasr: pip
  • Python version: 3.10
  • GPU : NVIDIA 3090
  • CUDA/cuDNN version : cuda12.1

Problem solved, don't use the model form huggingface, use the model auto download form modelscope.