X-LANCE / SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model
MIT License
576 stars 52 forks source link

about prompt for asr-llm #122

Closed fclearner closed 3 months ago

fclearner commented 3 months ago

Hi, @ddlBoJack

I noticed that the training prompt is more complex: "Transcribe speech to text. Output the transcription directly without redundant content. Ensure that the output is not duplicated." However, the inference prompt is much simpler: "speech transcribe."

Is there any research on prompt methods for ASR-LLM that suggests prompts should be simpler during inference?

Thank you in advance for your response.

ddlBoJack commented 3 months ago

Maybe there is something misunderstood. We set the same prompt of "Transcribe speech to text. " for training and inference.

fclearner commented 3 months ago

Maybe there is something misunderstood. We set the same prompt of "Transcribe speech to text. " for training and inference.

Thanks for the reply, the codes in speech_dataset.py prompt will be set to "Transcribe speech to text. Output the transcription directly without redundant content. Ensure that the output is not duplicated." when prompt is None, however, the example in librispeech will set the prompt to None: https://github.com/X-LANCE/SLAM-LLM/blob/683122402391806512a9d13febe8a952bb7c406e/src/slam_llm/datasets/speech_dataset.py#L112

PigeonDan1 commented 3 months ago

@fclearner I think the long prompt will also cause the longer training or inference time, that might be an important reason.

fclearner commented 3 months ago

@fclearner I think the long prompt will also cause the longer training or inference time, that might be an important reason.

yes, I agree with you ,however, the Seed ASR paper introduces context prompt method which performs good in domain-specific ASR. maybe a small sacrifice is worthwhile