Closed KPCOFGS closed 11 months ago
Since running on the custom model selection will use CPU (or GPU if available or using Apple M-series chips) the next best option would be using a purpose built tool like LocalAI where you can run the service with GPU acceleration and custom models (GGUF format).
You will likely get better performance this way by offloading the inferencing somewhere with a GPU.
It is also worth noting that running AnythingLLM in docker will force CPU base inferencing and not use your underlying host system. Only in development will the local model model use the underlying host system - hence why this is still in experimental :)
@timothycarambat Hi, I'm also confusing why the gguf model cannot be loaded into anythingllm. When running anythingllm, Model Selection is always empty. Could you please give some kind advice?
My env was built locally from source following the tutorial.
My env: Mac M1
model path: /app/server/storage/models/ggml-model-q4_0.gguf
Hi! I think you will need to put the model under /server/storage/models/downloaded folder
@KPCOFGS Thanks. It works after changing the model_path
to /app/server/storage/models/downloaded/ggml-model-q4_0.gguf
.
@KPCOFGS @timothycarambat Hi, could you please help me with this issue? When I send a chat message to the local llm(https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.6), it could not respond to message. After resetting the llm preference, docker container crashed after sending a chat message.
What do the docker logs say?
@timothycarambat Where can I find some log information for debug?
Even setting a LocalAI LLM server, anythingllm cannot build connections to the server. It seems that anythingllm also could not respond to message from the LocalAI server. It works well only if I set the OpenAI API as a LLM server.
LocalAI LLM server:
Local connection test:
curl http://localhost:8080/v1/models
{"object":"list","data":[{"id":"ggml-gpt4all-j","object":"model"}]}%
anything-llm with LocalAI server:
anything-llm with OpenAI API Key:
The docker logs are available with however you run AnythingLLM, so docker logs <CONTAINER_ID>
or if in docker desktop you can see it by clicking on the running container
where can I find this model?
When I tried to use my own custom Llama model that I downloaded from internet, I see it says "Using a locally hosted LLM is experimental. Use with caution." When I tried to load a model by clicking "waiting for models," there is no response of it. If I want to add the custom LLM manually, I need to go the storage/models/downloaded ?