FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model
https://funaudiollm.github.io/
Other
3.49k stars 317 forks source link

No code-path supporting cache-aware streaming for SenseVoiceSmall Model? #143

Open Jacob-Bishop opened 1 month ago

Jacob-Bishop commented 1 month ago

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

Does this model support cache-aware streaming?

The paper reference in SenseVoiceEncoderSmall, as well as the forward_chunk methods in the SinusoidalPositionEncoder and MultiHeadedAttentionSANM, suggest to me that this is intended to support cache-aware streaming. However, I don't see a code-path in the SenseVoiceSmall model that would support that.

Am I missing something? If this is not supported yet, is it planned for the future?

Looking at the FunASR example for streaming models in the readme , I would expect this to support a streaming api like:

cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):
    speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
    is_final = i == total_chunk_num - 1
    res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)
    print(res)

Thanks!

Code

What have you tried?

What's your environment?

pengzhendong commented 1 month ago

You may refer to my repository: https://github.com/pengzhendong/streaming-sensevoice/blob/master/realtime.py