Open maciekpoplawski opened 2 weeks ago
Maybe somebody end up here with a problem - on inference_client.py i was missing this to launch it on linux Ubuntu
sudo apt-get install libportaudio2
same problem!!
@wpq3142 did the libportaudio2 fix work for you? I've added it to the readme
Sorry i mixes two things in one issue. Original issue from first post is not resolved. libportaudio2 fix was needed on Ubuntu to be able to select audio devices. And it WORKS.
how much vRam memory requires? I have 16GB 3060 and got CUDA out of memory.
how much vRam memory requires? I have 16GB 3060 and got CUDA out of memory.
with our current bfloat16 implementation, 24GB.
with our current bfloat16 implementation, 24GB.
Will there be a quantized or optimized build in the upcoming future?
Hi! Good job one the model.
But i have trouble testing it. Setup RTX4090 + 64gb ram (on loading models im kissing 63.9gb :) )
Tested on Windows - can't launch with default code cause of missing support for FLASH_ATTENTION Exchanged for EFFICIENT_ATTENTION and i hear initial prompt with "bob how's it going bob" and than silence
Unfortunately same on Linux (no errors setup). Only initial prompt and nothing more. Silence :(
Torch installed with this command:
pip3 install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
Any tips how to go further with this?