Standard-Intelligence / hertz-dev

first base model for full-duplex conversational audio
https://si.inc
Apache License 2.0
1.61k stars 102 forks source link

Tool Use / Function Calling #30

Open josiahbryan opened 1 week ago

josiahbryan commented 1 week ago

Is there a way to get this to do function calling (tool use)?

AbrahamSanders commented 1 week ago

Not with the current model checkpoints as they are pure audio codec models.

Function calling in the traditional sense requires text generation capability (and typically instruction tuning, although this is not strictly required). Perhaps a future fine-tuned version of hertz-dev could support interleaved text + audio tokens, which would facilitate function calling, RAG, CoT, and other use cases. I hope the authors can comment on whether there are plans in this direction 😉