Open zouy50 opened 1 year ago
Ok I add this param to llamaCpp function like this
kwargs = {
"model_path": model_path,
"n_batch": 512,
"n_ctx": max_ctx_size,
# "max_tokens": max_ctx_size,
# "echo": True,
"verbose": True,
"callback_manager": CallbackManager([StreamingStdOutCallbackHandler()]),
"f16_kv": True
}
the key param is this:
"callback_manager": CallbackManager([StreamingStdOutCallbackHandler()])
langchain introduction page is : https://python.langchain.com/docs/guides/local_llms
I find the problem, the ingest.py has no param using mps, so I will fix this.
The problem is: I only can see the output calculate by cpu, if I switch using device_type mps, it can run ok, but I cannot see the output, when I
pprint(memory.load_memory_variables({}))
there are many characters like this:My environment is this: CPU and GPU: Apple M1 16G Python version: Python 3.11.4 llama-cpp-python == 0.1.78
I have tried to only run llama without langchain, the mps is run ok, and can output string, mps fast than cpu. So I don't know the problem is langchain or somewhat, and I don't know how to fix this, I really don't wanna use cpu, it's slow and make my Mac very hot.