NVIDIA / audio-flamingo

PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
MIT License
173 stars 10 forks source link

inference example for interleave multiple audio-text #12

Open androstj opened 2 months ago

androstj commented 2 months ago

I am looking at the https://github.com/NVIDIA/audio-flamingo/blob/main/inference/inference_examples.py files and I couldn't find any example that use interleaving multiple audios and texts. However, I saw the use case in the paper: image

Could you provide some code on how to do inference with multiple audio-text in interleave way?

zhifengkongnv commented 2 months ago

See these instructions for how to run ICL inference over a dataset with precomputed retrieved data https://github.com/NVIDIA/audio-flamingo/blob/main/foundation/data/README.md https://github.com/NVIDIA/audio-flamingo/blob/main/foundation/inference/README.md

SoshyHayami commented 1 month ago

@zhifengkongnv may I ask you to prepare a notebook for the inference if you have some time to spare? this repo seem to be rather complicated to navigate