fixie-ai / ultravox

A fast multimodal LLM for real-time voice
https://ultravox.ai
MIT License
871 stars 47 forks source link

datasets: Handle converting `int16` audio data in `VoiceSample`. #26

Closed shaper closed 3 months ago

shaper commented 3 months ago

We saw VoiceSample failing the assert on float32 audio data when playing around with the Gradio infer app and submitting an mp3 file. We didn't dig deeper into Gradio (I'm sure it's possible to alter/convert there as well), but it seems potentially useful for VoiceSample to handle int16 audio data on top of what it already handles.

shaper commented 3 months ago

If it's easy, suggest adding a test to datasets_test.py, a la the existing create_sample.

Added tests, using some type hints in the test resulted in the need to enhance VoiceSample.audio type from audio: Optional[np.ndarray] = None to audio: Optional[NDArray[np.float32]] = None which I think is beneficial (and didn't create any other type issues elsewhere).

Will wait a bit to merge in case there's feedback on the tests.

shaper commented 3 months ago

@juberti need you or @farzadab to land as I don't have write access. Excitement! Thanks for the onboarding guidance.

farzadab commented 3 months ago

Merged and gave you write access. Thanks!