-
大佬,之前看到这个项目一直在测试模型,我这边目前使用的是llamacpp,因为有公司这边的服务器,所以是用cpu的,速度10t/s。公司内部用足够了。
知识库用到的模型:
Qwen2.5-14B:q4
bge-reranker-base
Dmeta-embedding-zh-small
大佬这是我这边使用的模型,不知道能不能改成使用cpu的,非常感谢。
# 后台启动 noh…
-
### Describe the bug
Hi!
I tried to run an llm locally using `openllm`, and `phi3:3.8b-ggml-q4` happens to be the only model which I am able to run locally according to openllm, so I ran `openl…
-
there seems to be no configuration for .env.local that I can get to work to connect to a Llama3 inference endpoint hosted by HuggingFace cloud (and I can find no examples).
```
MONGODB_URL=mong…
thams updated
3 months ago
-
I went through the code and it downloads a model from GPT4ALL.
How can I add my .guff file to the android project and use it instead? I won't be able to share it on the App Store but thats OK.
H…
-
### Describe the issue as clearly as possible:
I run `examples/llamacpp_example.py`
```
outlines/models/llamacpp.py:180: FutureWarning: The input object of type 'Tensor' is an array-like implem…
-
### Environment
🐧 Linux
### System
Mozilla/5.0 Linux x86_64 Firefox/131.0
### Version
staging (last version of this repo)
### Desktop Information
Node JS Version
Node.js v18.20.4.
API
oobabo…
-
Everyone's gotta have an LLM-powered search engine feature right?
https://github.com/developersdigest/llm-answer-engine
-
I am getting this error:
```
llama.cpp: loading model from /Documents/Proj/delta/llama-2-7b-chat/ggml-model-q5_1.bin
error loading model: unrecognized tensor type 14
llama_init_from_file: failed…
-
Outlines currently support the vLLM inference engine, it would be great if it could also support the tensorRT-LLM inference engine.
-
Adding high Res support although somewhat difficult as already with the two models cascade is hard to run it would make images way more detailed. Also maybe incorporate an option for. A llm like west …