Extend llm_build_ffn() to support _scale tensors

ggerganov / llama.cpp

LLM inference in C/C++

MIT License

65.59k stars 9.41k forks source link

Extend llm_build_ffn() to support _scale tensors #8103

Closed Eddie-Wang1120 closed 3 months ago

Eddie-Wang1120 commented 3 months ago

[x] I have read the contributing guidelines
Self-reported review complexity:
- [x] Low
- [ ] Medium
- [ ] High

This PR is to extend llm_build_ffn() to support _scale tensors, which could make BitNet reuse llm_build_ffn(), also may suits for other models which use per-tensor quantization.