Open dingdongwang opened 9 months ago
hi there,
It really depends on your GPU, but in general, 1 minute would be fine.
Our code supports 10 seconds (hard coded) at 3.2Hz, so 32 audio tokens. We have about 100-200 text tokens, so in total ~200 tokens.
For 1 minute, you would need 192 audio tokens, counting 100-200 text tokens, you would need ~400 tokens, which doubles our cost. And you would need some engineering effort to change our hard coded part.
-Yuan
Hi, may I ask what the maximum allowable length is for audio input? Would a 1-minute WAV file be within the acceptable range?
Thank you!