AI-Hypercomputer / jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
Apache License 2.0
41 stars 15 forks source link

Add an option to not quantize embedding layer when doing quantization. #191

Closed qihqi closed 1 month ago

qihqi commented 1 month ago

This helps in getting better quality for small models (gemma 2b) etc.