Open causten opened 11 months ago
Currently quantization workflow uses the FX pipeline. This is not supported for transformer models and so this requires adding quantization to the torch.compile workflow.
Reference: https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html
Add one example to repo and DLM