Configurable model parameters

Please make temperature, top_p,, max_tokens, etc. configurable live in a settings window. E.g., temperature and top_p for code generation and text summarisation are typically different (~0 temp, 0.9-1.0 top_p) and 0.7-0.9 and 30-50 for summarisation are common ranges respectively. Just need a table for these which is passed to the API. Also presence and repeeat penalty should be configurable, these are less of an issue with GPT 3.5 turbo, but were crucial for Codex models which are now history it seems. Still, for summarisation I'd tune these to make the model more likely to come up with new topics and ideas.

Lastly, it would be good to show token usage like in playground and e.g., change prompt bg to red when prompt + history + completion tokens exceed the max to avoid unnecessary cut off messages and lost prompts and session refresh unexpectedly. E.g., I tend to tweak max_tokens to prompt + rest for completion initially in case I get a long response, and then tune down as the chat progresses. I haven't tested, maybe this is implemented, but would be good to have a high water mark/trim level for context/history sent to the model and automatically make room for a new prompt and configured completion tokens by deleting old data. This is how ChatGPT UI works, you notice it loses context but it doesn't kill the session. Playground is more harsh in this matter, but I much prefer chat.openai.com to playground.

Bryley / neoai.nvim

Configurable model parameters #28