Open haydonryan opened 1 month ago
This repo is fantastic! Would be really good to include a q8. q4 to fp16 is a big jump on 70b. :)
I second this. Q8 is almost no loss compare to fp16 and use half of VRAM. It'd be very meaningful to decide what card to use.
This repo is fantastic! Would be really good to include a q8. q4 to fp16 is a big jump on 70b. :)