MichaelYuhe / ai-group-tabs

Organize and group your Chrome tabs with AI
MIT License
971 stars 85 forks source link

[feature request]: support for ollama self-hosted llm #35

Open skyf0cker opened 9 months ago

skyf0cker commented 9 months ago

After a few test, i found that using the open source model like mistral 7B can also do this job well. By support these self-hosted model, users don't need to worry about the networking issues related to connecting to OpenAI's network and the potential costs associated with using the models.

image image
nohzafk commented 9 months ago

I use LM Studio to start a local server with success, you can try to use http://localhost:11434 in the externsion options API URL.

skyf0cker commented 9 months ago

I use LM Studio to start a local server with success, you can try to use http://localhost:11434 in the externsion options API URL.

Only fill the api field with the local server addr and not change the model name? It looks like they are using the diffrent api path between openai and ollama. I don't think this can work but i will give it a try.

skyf0cker commented 9 months ago

I use LM Studio to start a local server with success, you can try to use http://localhost:11434 in the externsion options API URL.

it doesn't work. maybe because the LM studio you used have the same api path with openai?

nohzafk commented 9 months ago

It has something to do with the prompt format and openai api compatability, LM Studio can handle them. I don't have experience with ollama, so you might need to figure it out.

skyf0cker commented 9 months ago

It has something to do with the prompt format and openai api compatability, LM Studio can handle them. I don't have experience with ollama, so you might need to figure it out.

Yep, ollama using the different api path (you can check it out in its doc: https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-a-completion) or you can see it in my screenshot in the decscription of the issue. It looks like using the specific prompt for Mistral can have a more compelling performance, according to doc. However, I understand that all of these are achievable. If the support for Ollama is acknowledged at the product level, then I can make adjustments to the implementation details. As for the implementation itself, I can also dedicate some of my spare time to it.

example using mistral with recommand prompt template:

curl -s http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "stream": false,
  "prompt":"<s>[INST] You are a url classifier, you based on the given url to classify the browser tab type as one of the following: Development, Utilities, Entertainment. Respond with only one single word (without any explaination or punctuation) from the given list. So for instance the following: https://github.com/skyf0cker/ai-group-tags will belong to: [/INST]Development</s>[INST]https://reddit.com[/INST]"
}' | jq '.response'

response: "Entertainment"

nohzafk commented 9 months ago

I definetely think supporting local LLM is the ideal choice. The task is well-suited for a small local LLM. Any contributions are welcome! 👍

tikikun commented 9 months ago

right now you can use https://nitro.jan.ai/ it supports an openai compatible endpoint

MichaelYuhe commented 9 months ago

I think the task even doesn't need a local LLM, it can be done with traditional embedding. Just run the embedding and classify inside browser with JavaScript. It's faster and protecting users' privacy

hqwuzhaoyi commented 8 months ago

How about adding keywords for classification, and processing them just like Filter Rules?

nohzafk commented 8 months ago

I believe that Candle is an excellent choice, and I recommend considering support it. Candle primarily focuses on serverless inference and provides the ability to run models within browsers using wasm.

77 is also talking about local first computation support.

rainzee commented 8 months ago

Yes, using the LLM is a bit of overkill, our task is relatively simple.

77 make big picture tradoff

MichaelYuhe commented 8 months ago

I think the best solution is training or using a small model running in the browser

nohzafk commented 8 months ago

I hold the identical opinion; I attempted the Microsoft/Phi-2 model (very small around 2.7G), but unfortunately, it did not perform well on this classification task.