Inference problem - Githubissues

Standard-Intelligence / hertz-dev

first base model for full-duplex conversational audio

https://si.inc

Apache License 2.0

1.58k stars 102 forks source link

Inference problem #9

Open maciekpoplawski opened 2 weeks ago

maciekpoplawski commented 2 weeks ago

Hi! Good job one the model.

But i have trouble testing it. Setup RTX4090 + 64gb ram (on loading models im kissing 63.9gb :) )

Tested on Windows - can't launch with default code cause of missing support for FLASH_ATTENTION Exchanged for EFFICIENT_ATTENTION and i hear initial prompt with "bob how's it going bob" and than silence

Unfortunately same on Linux (no errors setup). Only initial prompt and nothing more. Silence :(

Torch installed with this command: pip3 install torch torchaudio --index-url https://download.pytorch.org/whl/cu118

Any tips how to go further with this?

maciekpoplawski commented 2 weeks ago

Maybe somebody end up here with a problem - on inference_client.py i was missing this to launch it on linux Ubuntu sudo apt-get install libportaudio2

wpq3142 commented 2 weeks ago

same problem!!

calculating commented 2 weeks ago

@wpq3142 did the libportaudio2 fix work for you? I've added it to the readme

maciekpoplawski commented 2 weeks ago

Sorry i mixes two things in one issue. Original issue from first post is not resolved. libportaudio2 fix was needed on Ubuntu to be able to select audio devices. And it WORKS.

KadirErturk4r commented 1 week ago

how much vRam memory requires? I have 16GB 3060 and got CUDA out of memory.

calculating commented 1 week ago

how much vRam memory requires? I have 16GB 3060 and got CUDA out of memory.

with our current bfloat16 implementation, 24GB.

robonxt-ai commented 4 days ago

with our current bfloat16 implementation, 24GB.

Will there be a quantized or optimized build in the upcoming future?