Update to Llama 3 8B model

EwoutH commented 6 months ago

It would be great if the LLaMa 2 13B AWQ 4bit quantized model currently used would be upgraded to the Llama 3 8B model. It can be quantized similarly. This would have several advantages:

Llama 3 8B model performs significantly better on all benchmarks
Being an 8B model instead of a 13B model;
- it could reduce the VRAM requirement from 8GB to 6GB, enabling popular GPUs like the RTX 3050, RTX 3060 Laptop and RTX 4050 Laptop to run this demo.
- It would be more than 50% faster due to the reduction in parameter count.

The models are available at: https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6

anujj commented 5 months ago

Thanks for the suggestions . Will check internally

shotelco commented 5 months ago

Any updates on this?

oscarbg commented 4 months ago

+1

3bagorion33 commented 3 weeks ago

+1

luizdequeiroz commented 3 weeks ago

+1

NVIDIA / ChatRTX

Update to Llama 3 8B model #55