Closed pragnakalpdev6 closed 4 years ago
Unfortunately, the 1.5B model is just really, really big. You can batch your predictions or run on faster hardware, but there isn't much more you can do. Maybe there is some way to use distillation methods or the like to reduce the model size, but I'm not familiar with any research of doing so with the GPT2 model specifically.
I am working on gpt2 1.5b model. It is taking too much time for inference how can i decrease the time taken by the model? How can i optimize my model?