Closed fclearner closed 3 months ago
Maybe there is something misunderstood. We set the same prompt of "Transcribe speech to text. " for training and inference.
Maybe there is something misunderstood. We set the same prompt of "Transcribe speech to text. " for training and inference.
Thanks for the reply, the codes in speech_dataset.py prompt will be set to "Transcribe speech to text. Output the transcription directly without redundant content. Ensure that the output is not duplicated." when prompt is None, however, the example in librispeech will set the prompt to None: https://github.com/X-LANCE/SLAM-LLM/blob/683122402391806512a9d13febe8a952bb7c406e/src/slam_llm/datasets/speech_dataset.py#L112
@fclearner I think the long prompt will also cause the longer training or inference time, that might be an important reason.
@fclearner I think the long prompt will also cause the longer training or inference time, that might be an important reason.
yes, I agree with you ,however, the Seed ASR paper introduces context prompt method which performs good in domain-specific ASR. maybe a small sacrifice is worthwhile
Hi, @ddlBoJack
I noticed that the training prompt is more complex: "Transcribe speech to text. Output the transcription directly without redundant content. Ensure that the output is not duplicated." However, the inference prompt is much simpler: "speech transcribe."
Is there any research on prompt methods for ASR-LLM that suggests prompts should be simpler during inference?
Thank you in advance for your response.