Open jwang1993 opened 10 months ago
您好,论文中有提到 “Output Instruction: Lastly, we provide output instruction to further specify the task and desired format for different subtasks, and then the text output begins.”
以下这些Output Instruction在训练和推理阶段是如何使用的? 我的理解是Output Instruction 放在prompt 结尾,如: query = f"{audio_url}{sp_prompt}" 其中sp_prompt是"<|startofanalysis|><|unknown|><|keyword|><|zh|><|notimestamps|><|wo_itn|><|audioset_ontology|>" 不知道这种理解对不对?
"<|caption_audiocaps|>", # Audiocaps caption style "<|caption_clotho|>", # Clotho caption style "<|audioset_ontology|>", # Audioset ontology style "<|caption_plain|>", # plain caption "<|itn|>", # inversed text normalized "<|wo_itn|>", # without inversed text normalized "<|startofentityvalue|>", "<|endofentityvalue|>", "<|startofentitytype|>", "<|endofentitytype|>", "<|named_entity_recognition|>", # named entity recognition task "<|audio_grounding|>", "<|startofword|>", "<|endofword|>", "<|delim|>", # delimiter of timestamps pair in audio grounding "<|emotion_recognition|>", # emotion recognition "<|music_description|>", # music description "<|note_analysis|>", # note analysis "<|pitch|>", # note analysis: pitch *[f"<|midi_pitch_{i}|>" for i in range(128)], # midi pitch 0-127 "<|velocity|>", # note analysis: velocity *[f"<|midi_velocity_{i}|>" for i in range(128)], # midi velocity 0-127 "<|sonic|>", # note analysis: sonic "<|instrument|>", # note analysis: instrument "<|speaker_meta|>", # meta information of speaker "<|song_meta|>", # meta information of song "<|question|>", # AQA: question "<|answer|>", # AQA: answer "<|choice|>", # AQA: answer choice "<|scene|>", # scene recognition "<|event|>", # sound event "<|vocal_classification|>", # vocal classification "<|speech_understanding|>", # speech language understanding "<|scenario|>", # speech language understanding: scenario "<|action|>", # speech language understanding: action "<|entities|>", # speech language understanding: entities "<|speech_edit|>", # speech edit
'{}<|startofanalysis|><|unknown|><|caption|><|en|><|notimestamps|><|caption_{}|>'
您好,论文中有提到 “Output Instruction: Lastly, we provide output instruction to further specify the task and desired format for different subtasks, and then the text output begins.”
以下这些Output Instruction在训练和推理阶段是如何使用的? 我的理解是Output Instruction 放在prompt 结尾,如: query = f"{sp_prompt}" 其中sp_prompt是"<|startofanalysis|><|unknown|><|keyword|><|zh|><|notimestamps|><|wo_itn|><|audioset_ontology|>" 不知道这种理解对不对?
Output Instruction