GreenBitAI / green-bit-llm

A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.
https://huggingface.co/blog/NicoNico/green-bit-llm
Apache License 2.0
62 stars 7 forks source link

Compatibility with huggingface transformers/vllm #20

Closed greatzane closed 2 days ago

greatzane commented 1 week ago

Can I use huggingface transformers or vllm to load the model generated by Q-SFT and do the inference work?

NicoNico6 commented 1 week ago

Hi, thanks for your attention to this work.

  1. The huggingface transformers/vllm did not officially support the MBWQLinearCuda layer yet. We could introduce our implementation to transformers by manually replacing typical nn.linear layer with our implementation. You may check the details in make_quant function: https://github.com/GreenBitAI/green-bit-llm/blob/05b310df9b7eae9970cb25982780443858236a3b/green_bit_llm/common/model.py#L206

    For vLLM, the compatibilly is in plan, please stay tuned.

  2. The adapter generated by Q-SFT is fully compatibility with transformers (https://github.com/GreenBitAI/green-bit-llm/blob/05b310df9b7eae9970cb25982780443858236a3b/green_bit_llm/sft/peft_utils/gba_lora.py#L34).