Closed qixueweigitbub closed 1 week ago
To use audio-only mode, please first modify video_salmonn/config/test.yaml
line 7-10 to
all_decode_info: [
["audio", "audio_input", "Your example audio-only json file"]
]
Then in your audio-only json file, please use the same format as example.json
but you only need to provide one path for "image_name", e.g. one data item could be the following:
{
"image_name": "./dummy/4405327307.wav",
"conversation": [
{
"from": "human",
"value": "Describe the audio in detail"
},
{
"from": "gpt",
"value": "None"
}
]
}
The performance is worse and is less robust to noise than SALMONN because we use much smaller audio/speech training data than SALMONN. Please compare the ASR/AAC numbers in both papers to understand their performance differences.
Feel free to reopen this issue if there are still problems.
Dear authors. thanks for your great job and contribution to the research community.
In my use case, I need use video-salmonn model for reasoning on audio file only. I know I can use original SALMONN model for audio reasoning, but my deployment can not have two large models, so is it possible to just input audio file to video-salmonn and get outputs?
And, if Yes, will the performance on audio modality similar to SALMONN, which is trained for audio modality?