Open carter54 opened 4 years ago
@TaoLv
Sorry for missing the message. We're working on cleaning the code and solution. Hope we can have a PR soon. I'm not familiar with the status of GPT2 in GluonNLP. Could you please point me to the scripts and whether it can be exported as a static model?
Yes, recently static GPT2 model is supported: https://github.com/dmlc/gluon-nlp/pull/1010
Thanks for the replies. @leezu @TaoLv Looking forward to try int8 bert and gpt2 soon~
@carter54 FYI, here is the PR for BERT quantization: #1080
@TaoLv Thx for the work, can this method be applied to GPT 2 model?
Description
the news in https://github.com/dmlc/gluon-nlp/releases/tag/v0.8.1 shows BERT int8 quantization is presented in blog https://medium.com/apache-mxnet/optimization-for-bert-inference-performance-on-cpu-3bb2413d376c But the blog only shows some results of BERT quantization test,
When will this work be released and can we apply this quantization method on GPT2?
Thanks a lot for the great work!