[BUG / UX improvement] Audio chunks logic is erratic

Describe the bug The behavior of the initial silence delay together with the timeout intended to stop an audio recording do not behave in a stable way, there are times that the initial delay does not last the set time, in general the first recording attempt works differently. In general, the first test generates corrupt chunks that do not represent a webm format file (this particularly in chrome and not in firefox). Also in chrome the behavior is more unstable than firefox.

To Reproduce Open the chrome browser, refresh the page and try to save audio chunks in an real workflow and the TTS api will throw an unknown audio format exception, try many times refreshing the browser in order to check the first attemp of audio recording.

The test scenary was an ubuntu 24.04 + chrome + firefox (latest production versions)

Expected behavior A fluid recording and stable experience, with a fluid UI feedback

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: Ubuntu 24
Browser Chrome, Firefox

Smartphone (please complete the following information): Didnt tested

UX/UI Suggested improvemnt

Additionally, I did usability tests with 15 different users, only 3 were able to communicate fluently with the assistant, the other 12 were asked, generally indicating that the recording mechanics were not very intuitive. The majority of users surveyed indicated that it would be very easy to use if the platform followed the behavior of audio notes from whatsapp or telegram for example since they are behavior that the general public is very accustomed and conditioned to.

Something like push to talk experience or push once to record without initial silence delay (many times the audio chunk did not capture the entire audio sequence) and push once to stop without auto stop by silence timeout (many border cases can appear in a real world talking scenarios). At least those behaviours could be a setting.

Chainlit / chainlit

[BUG / UX improvement] Audio chunks logic is erratic #1094

UX/UI Suggested improvemnt