MeetKai / functionary

Chat language model that can use tools and interpret the results
MIT License
1.37k stars 107 forks source link

FP8 version #224

Open themrzmaster opened 2 months ago

themrzmaster commented 2 months ago

Thanks for your work! Would be nice to have FP8 versions avilable on HF, as vLLM has special Kernels for it and flash attention 3 is moving on that directiong too.

Thanks

khai-meetkai commented 2 months ago

Hi @themrzmaster, you mean 8-bit-AWQ right? which version are you interested in v2.5 or v3?

themrzmaster commented 2 months ago

v3! thanks

localmind-ai commented 1 month ago

@themrzmaster @khai-meetkai you can live-quantize with --quantization fp8 when launching the included vLLM script, no need for specific models. Only caveat is you still need to download the regular weights, but after that, quantization works fine. Already tested on latest medium functionary.

localmind-ai commented 1 month ago

@khai-meetkai also one more mention when doing AWQ quants (you probably know it already but I wanted to mention it just in case): it's quite important that the calibration dataset aligns with the use case of function calling, so it's probably a good idea to calibrate not just on some default dataset but also mixed with your own dataset (with some fc samples).

This makes AWQ quants (especially 4 bit) a bit more optimized and reliable. We tested this on some of the older medium functionary models and got better results by expanding the dataset we use for AWQ quantization to synthetically generated function calling data from your original model.

khai-meetkai commented 1 month ago

Hi @localmind-ai, thank you for reminding us ! Yeah, the calibration dataset should also be function calling data. Currently, we don't have any plans for creating .AWQ as we have more urgent tasks. But we will definitely use function calling data as calibration data if we do so.

localmind-ai commented 1 month ago

Thanks for the information @khai-meetkai! Fully understandable.

khai-meetkai commented 1 month ago

@localmind-ai We have just released meetkai/functionary-medium-v3.1-fp8 using small part of training data as calibration data. From our evaluation, this quantized model gave almost the same results as the original model