Instead of waiting the speech and or text, stream the response and take action with partial information (if possible)
With the same idea, if possible, stream the speech to text also. Don’t wait until the user stops talking to process all the information
Distract the user with something while your app is doing background tasks (some loading animation or even some status like “Opening firefox… opening instagram…”
Answer something instantly after the user command like “Alright, im already working on that”. This will make tasks that took 3 seconds looks like took 2 (because the awkwardness is in the silence without doing nothing/saying nothing)
Consider using parallel processing and race conditions for tasks that are not intrinsicaly related for example: “search online for dinosaur images (parallel process 1) create a folder with a specific name on my desktop (parallel process 2), and then build a website using those images along with some related information using HTML and JavaScript (wait for 1 and 2 and then do this task)”
Felipe Gallo's advice for Omni applies to Open MMPA too: https://discuss.ai.google.dev/t/seeking-feedback-on-omni/38695/3?u=tocsa