ELS-RD / transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
https://els-rd.github.io/transformer-deploy/
Apache License 2.0
1.64k stars 150 forks source link

[Question] Documentation for generative model API and parameters? #129

Open tanmayb123 opened 2 years ago

tanmayb123 commented 2 years ago

I can't seem to find any documentation around how I would specify parameters such as max generation length, stop tokens, temperature, etc., for decoder-based models like GPT-2. Currently my API requests are only generating a single token, and I'd obviously like to generate more (up until a specified stop token preferably).

ayoub-louati commented 2 years ago

@tanmayb123 Currently, we are not planning to open those parameters, you can try either to add parameters with Triton or to try to pass the wanted parameters in a json way.