Add an option to not quantize embedding layer when doing quantization.

AI-Hypercomputer / jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

Apache License 2.0

41 stars 15 forks source link

Closed qihqi closed 1 month ago

qihqi commented 1 month ago

This helps in getting better quality for small models (gemma 2b) etc.