Open androstj opened 2 months ago
See these instructions for how to run ICL inference over a dataset with precomputed retrieved data https://github.com/NVIDIA/audio-flamingo/blob/main/foundation/data/README.md https://github.com/NVIDIA/audio-flamingo/blob/main/foundation/inference/README.md
@zhifengkongnv may I ask you to prepare a notebook for the inference if you have some time to spare? this repo seem to be rather complicated to navigate
I am looking at the https://github.com/NVIDIA/audio-flamingo/blob/main/inference/inference_examples.py files and I couldn't find any example that use interleaving multiple audios and texts. However, I saw the use case in the paper:
Could you provide some code on how to do inference with multiple audio-text in interleave way?