Closed SeanEClarke closed 3 hours ago
Just adding some clarity...
I don't think this needs to maintain multiple states, that would probably complicate the interface, and the simpler these things are, the easier they are to integrate and maintain. I think if multiple simultaneous VAD streams were needed then it probably makes more sense to run multiple contexts - this is certainly how I use Whisper where I have separate instantiations of the Whisper model/struct which only deals with its one audio stream.
In order to off the most flexible API, it would be really useful if there could be some state management between successive calls.
This could be where the caller keeps state and passes it back in on every call, something like:
or have state managed thin a stateful struct, such as: