Compatibility with huggingface transformers/vllm

Hi, thanks for your attention to this work.

The huggingface transformers/vllm did not officially support the MBWQLinearCuda layer yet. We could introduce our implementation to transformers by manually replacing typical nn.linear layer with our implementation. You may check the details in make_quant function: https://github.com/GreenBitAI/green-bit-llm/blob/05b310df9b7eae9970cb25982780443858236a3b/green_bit_llm/common/model.py#L206

For vLLM, the compatibilly is in plan, please stay tuned.
The adapter generated by Q-SFT is fully compatibility with transformers (https://github.com/GreenBitAI/green-bit-llm/blob/05b310df9b7eae9970cb25982780443858236a3b/green_bit_llm/sft/peft_utils/gba_lora.py#L34).

GreenBitAI / green-bit-llm