any chance of a 70b q8 column for the inference tables?

XiongjieDai / GPU-Benchmarks-on-LLM-Inference

Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?

1.11k stars 43 forks source link

Open haydonryan opened 1 month ago

haydonryan commented 1 month ago

This repo is fantastic! Would be really good to include a q8. q4 to fp16 is a big jump on 70b. :)

charleswg commented 3 weeks ago

I second this. Q8 is almost no loss compare to fp16 and use half of VRAM. It'd be very meaningful to decide what card to use.