Instructions to run with llama-cpp-python or llama.cpp's serve directly?

raghur commented 9 months ago

Hello,

I'm hoping there's a simple way to run this with llama.cpp's server or the OpenAI compatible service exposed by llama-cpp-python?

I've seen #1 but it might help to doucment directly since ollama currently does not bundle GPU optimized builds which are quite easy to build with the other two options

David-Kunz commented 9 months ago

Hi @raghur ,

I haven't tried it, but could that be achieved be adapting

opts = {
  init = '...',
  command = '...' -- or function
}

?

mmealman commented 9 months ago

I believe the issue would be that the response from OpenAI compatible API's won't match the JSON you're expecting. For example, making a call to Text Generation WebUI's OpenAI using CURL:

 curl http://127.0.0.1:5000/v1/chat/completions -H "Content-Type: application/json" -d '{"messages": [{"role": "user", "content": "How are you doing today?"}], "temperature": 0.7 }'

The response:

{"id":"chatcmpl-1703657389591527424","object":"chat.completions","created":1703657389,"model":"LoneStriker_goliath-120b-3.0bpw-h6-exl2","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"I'm doing well, thank you for asking. How about you?"}}],"usage":{"prompt_tokens":42,"completion_tokens":16,"total_tokens":58}}

David-Kunz commented 9 months ago

Could it be mapped with jq or similar tools?

David-Kunz / gen.nvim

Instructions to run with llama-cpp-python or llama.cpp's serve directly? #54