Closed Ughuuu closed 4 months ago
With this I am using small model and getting from 0.1 - 0.3s times(sometimes even lower). I decreased maxTokens to 16(can be changed from project params). Also for realtime decreased max time to decode to 10s. Also made the audio_ctx be dynamic based on how much time to process.