Closed shiffman closed 6 months ago
In ollama, the message history collection is not implemented for streaming. Without adding it, the message history won't be populated and the llm won't have proper history context.
I have unified the API response types from ollama and replicate, so that the consumer does not have to differentiate between them. Also added message history collection for ollama. (Untested)
Unrelated to this, I also have added a new command manualPrompt
to prompt the AI by typing the message (as an alternative to voice input, for testing purposes).
I'm going to merge this to keep ollama on track as an alternative back-end, if anyone wants to try hooking this up to either GPT4 or Gemini, I'm happy to provide API keys to see how these models perform in comparison to llama!!
The
consumeStream()
method doesn't work with ollama due to a slightly different format so I've adpated the code. I am not having the[INST]
issue with ollama, however running 70b-chat on my M1 laptop seems to tax it quite a bit (longer latency than replicate streaming and fan going crazy.)