Open josiahbryan opened 1 week ago
Not with the current model checkpoints as they are pure audio codec models.
Function calling in the traditional sense requires text generation capability (and typically instruction tuning, although this is not strictly required). Perhaps a future fine-tuned version of hertz-dev could support interleaved text + audio tokens, which would facilitate function calling, RAG, CoT, and other use cases. I hope the authors can comment on whether there are plans in this direction 😉
Is there a way to get this to do function calling (tool use)?