Quantize: use --pure, --output-tensor-type and --token-embedding-type as the same time

[x] I have read the contributing guidelines
Self-reported review complexity:
- [ ] Low
- [x] Medium
- [ ] High

The priority of the three options, --pure, --output-tensor-type and --token-embedding-type for quantization was adjusted so that, while maintaining the same bit precision of each Transformer layer, it was possible to modify the bit precision of token embedding and LM head.

For example:

./llama-quantize --pure --output-tensor-type Q6_K --token-embedding-type Q3_K ./models/llama3-8b-f16.gguf ./models/llama3-8b-q4_k.gguf Q4_K

This command will quantize tensor to Q4_K for all the Transformer layers, which keep the token embedding as Q3_K and LM head as Q6_K. This may help users make their own quantization strategy based on their own insight.

ggerganov / llama.cpp

Quantize: use --pure, --output-tensor-type and --token-embedding-type as the same time #8129