Add INT8 example for BERT, Distil, and GPT2

ROCm / AMDMIGraphX

AMD's graph optimization engine.

https://rocm.docs.amd.com/projects/AMDMIGraphX/en/latest/

MIT License

185 stars 85 forks source link

Add INT8 example for BERT, Distil, and GPT2 #2515

Open causten opened 11 months ago

causten commented 11 months ago

Add one example to repo and DLM

shivadbhavsar commented 11 months ago

Currently quantization workflow uses the FX pipeline. This is not supported for transformer models and so this requires adding quantization to the torch.compile workflow.

Reference: https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html