Closed ioma8 closed 11 months ago
I would like to see Azure OpenAI 🙏 (not local, I know, but jumping in here anyway)
I'd definitely like to provide options for other models, but they have to be good enough to both A) write good code and B) respond in the required format. One option here would be to fine-tune a model to always respond in the required format, or force it to output in correct format using grammar-based-sampling.
@profplum700 do you have access to GPT-4 on Azure? I was having trouble getting access to it myself. Regardless, we can definitely add that.
@biobootloader I only have GPT-3.5 access. I could help test that only.
I'd suggest adding an endpoint URL for the config. That way, we could use any OpenAI-compatible API.
I'd definitely like to provide options for other models, but they have to be good enough to both A) write good code and B) respond in the required format. One option here would be to fine-tune a model to always respond in the required format, or force it to output in correct format using grammar-based-sampling.
@profplum700 do you have access to GPT-4 on Azure? I was having trouble getting access to it myself. Regardless, we can definitely add that.
Please do. Love these coding assistants, but until they can be used with a locally hosted LLM, they cost money to use. What about WizardCoder?
I tried WizardCoder and the problem is, its not able to follow the format that is required. It can give the code but it cannot create any files sadly. I tried tweaking it with no luck.
More reasons to support custom LLM endpoints. There is a service https://openrouter.ai/docs#api-keys allowing us to use gpt-4-32k and claude-2 APIs. All we need is the ability to edit the endpoint.
@biobootloader Realistically, WizardCoder is the only open model available right now that can write decent code. Grammar-based sampling works perfectly for responding in a particular format, but the ggml StarCoder program currently lacks support for grammar-based sampling. I've got access to Azure GPT4 (after appling to literally 3 waitlists) and it's much faster than OpenAI's endpoints in terms of latency and tok/s
I'd suggest adding an endpoint URL for the config. That way, we could use any OpenAI-compatible API.
@ztxtz this would be great!
More reasons to support custom LLM endpoints. There is a service https://openrouter.ai/docs#api-keys allowing us to use gpt-4-32k and claude-2 APIs. All we need is the ability to edit the endpoint.
Awesome. I'll check this out
@biobootloader Realistically, WizardCoder is the only open model available right now that can write decent code. Grammar-based sampling works perfectly for responding in a particular format, but the ggml StarCoder program currently lacks support for grammar-based sampling. I've got access to Azure GPT4 (after appling to literally 3 waitlists) and it's much faster than OpenAI's endpoints in terms of latency and tok/s
@zakkor Yeah, grammer based sampling could be a solution to using local models that aren't smart enough to k-shot the output format from examples in the system prompt. Another option would be to fine tune them to always respond in the mentat change format.
For adding other Open-AI compatible endpoints I've made a new issue: https://github.com/biobootloader/mentat/issues/58
This is already possible by setting the environment variable OPENAI_BASE_URL
to a OpenAI compatible endpoint (like the one offered by llama-cpp-python, and I think Azure too). This is the case for all packages using the openai
package.
I suggest this issue be closed, maybe have the env var documented.
@ErikBjare Maybe I'm misunderstanding something but I don't believe the openai
automatically updates api_base
from that environment variable, just like it doesn't automatically set api_key
and we need to read the env variable and set it ourselves if we want to support other base_urls.
@jakethekoenig It definitely does, that's all I've done in my own projects.
Here's the source for reference:
@ErikBjare ah, thanks so much for linking the source! In your original message you wrote OPENAI_BASE_URL
but it's actually OPENAI_API_BASE
. Yep it works for me.
Oops, sorry for the typo!
Will there be such option?