Option to use local llama model instead of OpenAI API

ioma8 commented 1 year ago

Will there be such option?

profplum700 commented 1 year ago

I would like to see Azure OpenAI 🙏 (not local, I know, but jumping in here anyway)

biobootloader commented 1 year ago

I'd definitely like to provide options for other models, but they have to be good enough to both A) write good code and B) respond in the required format. One option here would be to fine-tune a model to always respond in the required format, or force it to output in correct format using grammar-based-sampling.

@profplum700 do you have access to GPT-4 on Azure? I was having trouble getting access to it myself. Regardless, we can definitely add that.

profplum700 commented 1 year ago

@biobootloader I only have GPT-3.5 access. I could help test that only.

ztxtz commented 1 year ago

I'd suggest adding an endpoint URL for the config. That way, we could use any OpenAI-compatible API.

DrewPear309 commented 1 year ago

I'd definitely like to provide options for other models, but they have to be good enough to both A) write good code and B) respond in the required format. One option here would be to fine-tune a model to always respond in the required format, or force it to output in correct format using grammar-based-sampling.

@profplum700 do you have access to GPT-4 on Azure? I was having trouble getting access to it myself. Regardless, we can definitely add that.

Please do. Love these coding assistants, but until they can be used with a locally hosted LLM, they cost money to use. What about WizardCoder?

Ichigo3766 commented 1 year ago

I tried WizardCoder and the problem is, its not able to follow the format that is required. It can give the code but it cannot create any files sadly. I tried tweaking it with no luck.

ztxtz commented 1 year ago

More reasons to support custom LLM endpoints. There is a service https://openrouter.ai/docs#api-keys allowing us to use gpt-4-32k and claude-2 APIs. All we need is the ability to edit the endpoint.

zakkor commented 1 year ago

@biobootloader Realistically, WizardCoder is the only open model available right now that can write decent code. Grammar-based sampling works perfectly for responding in a particular format, but the ggml StarCoder program currently lacks support for grammar-based sampling. I've got access to Azure GPT4 (after appling to literally 3 waitlists) and it's much faster than OpenAI's endpoints in terms of latency and tok/s

biobootloader commented 1 year ago

I'd suggest adding an endpoint URL for the config. That way, we could use any OpenAI-compatible API.

@ztxtz this would be great!

More reasons to support custom LLM endpoints. There is a service https://openrouter.ai/docs#api-keys allowing us to use gpt-4-32k and claude-2 APIs. All we need is the ability to edit the endpoint.

Awesome. I'll check this out

biobootloader commented 1 year ago

@biobootloader Realistically, WizardCoder is the only open model available right now that can write decent code. Grammar-based sampling works perfectly for responding in a particular format, but the ggml StarCoder program currently lacks support for grammar-based sampling. I've got access to Azure GPT4 (after appling to literally 3 waitlists) and it's much faster than OpenAI's endpoints in terms of latency and tok/s

@zakkor Yeah, grammer based sampling could be a solution to using local models that aren't smart enough to k-shot the output format from examples in the system prompt. Another option would be to fine tune them to always respond in the mentat change format.

biobootloader commented 1 year ago

For adding other Open-AI compatible endpoints I've made a new issue: https://github.com/biobootloader/mentat/issues/58

ErikBjare commented 1 year ago

This is already possible by setting the environment variable OPENAI_BASE_URL to a OpenAI compatible endpoint (like the one offered by llama-cpp-python, and I think Azure too). This is the case for all packages using the openai package.

I suggest this issue be closed, maybe have the env var documented.

jakethekoenig commented 11 months ago

@ErikBjare Maybe I'm misunderstanding something but I don't believe the openai automatically updates api_base from that environment variable, just like it doesn't automatically set api_key and we need to read the env variable and set it ourselves if we want to support other base_urls.

ErikBjare commented 11 months ago

@jakethekoenig It definitely does, that's all I've done in my own projects.

Here's the source for reference:

jakethekoenig commented 11 months ago

@ErikBjare ah, thanks so much for linking the source! In your original message you wrote OPENAI_BASE_URL but it's actually OPENAI_API_BASE. Yep it works for me.

ErikBjare commented 11 months ago

Oops, sorry for the typo!

AbanteAI / mentat

Option to use local llama model instead of OpenAI API #1