Standard-Intelligence / hertz-dev

first base model for full-duplex conversational audio
https://si.inc
Apache License 2.0
1.59k stars 102 forks source link

Nonsense Responses #29

Open MarcoFerreiraPerson opened 2 weeks ago

MarcoFerreiraPerson commented 2 weeks ago

Hello,

I have been getting nonsense responses from the model after the initial prompt.

I have the client portion on Ubuntu 24 I have the server portion on Ubuntu 22

Thanks

SuperMaximus1984 commented 2 weeks ago

Same here, some mooing and hardly distinguishable wording

milkey-mouse commented 2 weeks ago

TBH as a base model the results are very prompt-dependent. It's easy to push the model out of distribution and it takes some experimentation to avoid this. But completions for the default "Bob" prompt should at least sound English-like; anything else seems like an inference or audio driver bug.

AbrahamSanders commented 1 week ago

From my experiments using the inference client (not the notebook) the model seems to be very sensitive to the token temperature. Lowering it below 0.8 leads it to degenerative repetition, where the model just repeats itself with silence or background noise tokens. Raising it beyond 0.9 leads to incoherent random speech sounds.

My intuition is that other sampling strategies, perhaps top_p or min_p sampling might yield better results. I'll do some experiments when I have time.