Closed raghur closed 7 months ago
Hi @raghur ,
I haven't tried it, but could that be achieved be adapting
opts = {
init = '...',
command = '...' -- or function
}
?
I believe the issue would be that the response from OpenAI compatible API's won't match the JSON you're expecting. For example, making a call to Text Generation WebUI's OpenAI using CURL:
curl http://127.0.0.1:5000/v1/chat/completions -H "Content-Type: application/json" -d '{"messages": [{"role": "user", "content": "How are you doing today?"}], "temperature": 0.7 }'
The response:
{"id":"chatcmpl-1703657389591527424","object":"chat.completions","created":1703657389,"model":"LoneStriker_goliath-120b-3.0bpw-h6-exl2","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"I'm doing well, thank you for asking. How about you?"}}],"usage":{"prompt_tokens":42,"completion_tokens":16,"total_tokens":58}}
Could it be mapped with jq or similar tools?
Hello,
I'm hoping there's a simple way to run this with llama.cpp's
server
or the OpenAI compatible service exposed by llama-cpp-python?I've seen #1 but it might help to doucment directly since
ollama
currently does not bundle GPU optimized builds which are quite easy to build with the other two options