Open Pawe98 opened 8 months ago
other differences found, but maybe they are intentional (would be cool if I can change them):
llama_new_context_with_model: n_ctx = 2048
vs
llama_new_context_with_model: n_ctx = 4096
and some cache and buffer sizes.
The general purpose of the issue is 1. the typo bug and 2. to ask for a fix/feature to make it possible to use the same model for embedding and chat via ollama
Before submitting your bug report
Relevant environment info
Description
I have the same ollama model configured as embeddingsProvider and as model. I can see in the ollama logs that the model switches and redeploys the same model but with different configuration. One thing that I could easily identify is (prob due to a typo) the BOS token:
llm_load_print_meta: BOS token = 32013 '<´¢£beginÔûüofÔûüsentence´¢£>'
vs.
llm_load_print_meta: BOS token = 32013 '<´¢£beginÔûüofÔûüsentence´¢£>''
As you can see there is a single quote too much at the end.
Model: deepseek-coder:6.7b
Config.json
To reproduce
use the same ollama model as embeddings provider, perform a query that uses the embeddings like @Codebase
Log output
`this is the log of a programm that loads the same model twice, do you know what is the difference between the loads?
log: