-
**Describe the bug**
Running model from a GGUF file using [llama.cpp](https://github.com/ggerganov/llama.cpp) is very straightforward, just like that:
`server -v -ngl 99 -m Phi-3-mini-4k-instruct-Q6…
-
My enc-dec model is not using `relative attention bias`, so I m trying to build engine with plain TRT mode.
However, when building the engine w/o gpt_attention_plugin, I got the following error:
…
-
An email was sent to the power,inference WGs for this, still adding as an issue for better tracking.
During the last LLM taskforce meeting the long runtime of the LLMs was raised as a concern and t…
-
### Describe the bug
interpreter throws error regarding network connection failure when running in local mode disconnected from the internet. it throws errors twice. first, even before asking for mod…
-
hey i recently learned that you can use privategpt (uses gpt4all, offline), not as good as openai i know, but its enough, maybe you can integrate them so you have choice to use openai or custom offlin…
-
I'm using [ChatGLM3-6b](https://github.com/THUDM/ChatGLM3) as LLM. It works normally in pure LLM mode.
When used in doc query mode, it will take a long time to search the document (I believe somethin…
-
### Describe the bug
Hello,
I would like try openllm offline but I can't.
For my test, I download huggyllama--llama-7b model with another computer with internet and I copy bento home to another c…
-
Hi, I'm trying the official image with config
```bash
vllm:
-
**例行检查**
[//]: # '方框内填 x 表示打钩'
- [x] 我已确认目前没有类似 features
- [x] 我已确认我已升级到最新版本
- [x] 我已完整查看过项目 README,已确定现有版本无法满足需求
- [x] 我理解并愿意跟进此 features,协助测试和提供反馈
- [x] 我理解并认可上述内容,并理解项目维护者精力有限,**不遵循规则的 fe…
-
I've run the instruction tuning bash script - but don't see a new checkpoint -- do you just overwrite the old checkpoint?