dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
2.18k stars 446 forks source link

llamaspeak 2 speech rate parameters are in the wrong format #445

Open rgobbel opened 6 months ago

rgobbel commented 6 months ago

I've already put this in a comment on another thread here, but I'm adding as a new issue so it shows up in searches.

The Riva TTS agent is failing because the voice rate parameter has the wrong format. It's being sent as a float, but it needs to be a string, one of "default" and a couple other options described here.

I made two changes to the code to get it working. First, in local_llm/agents/web_chat.py, line 42:

-                self.tts.rate = float(msg['tts_rate'])
+                self.tts.rate = f"{float(msg['tts_rate']):.0%}"

...and in plugins/audio/riva_tts.py, line 43:

-        self.rate = voice_rate
+        self.rate = f'{voice_rate:.0%}'

I'm running the Riva server on a separate PC, and using the —riva-server command line option to talk to it, since Riva has not yet been ported to JP 6.

dusty-nv commented 6 months ago

Thanks @rgobbel - I recently refactored this stuff into this:

https://github.com/dusty-nv/jetson-containers/blob/2fdb601aa6bb284023953d8d3f51f265fbb4dc29/packages/llm/local_llm/plugins/audio/auto_tts.py#L221

Hopefully this works better now, as I have been unable to test Riva TTS in JP6 yet (I have Riva ASR working on JP6 in an internal pre-release build), but will remember this thread for when I do.