FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

Add support for symmetric quantization #124

Closed julian-q closed 2 months ago

julian-q commented 1 year ago

This PR adds support for symmetric quantization when compressing/decompressing tensors. This is useful for comparing the performance of both symmetric and asymmetric quantization.