-
the script I use is https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/generate.py
with model Llama-2-70b-hf , the output sometimes is e…
-
The system prompt in the [llama2 blog post](https://huggingface.co/blog/llama2) contains an extra space and new line when compared to the [original](https://github.com/facebookresearch/llama/blob/6c7f…
-
Hi,
I'm trying to use your wonderful framework to do inference only. However, I'm not familiar with serving-related settings in your code. How to remove them? or change a bit of code?
By the way, …
-
Llama2-Chinese-13b-chat在线体验 回答都变英文了
![image](https://github.com/FlagAlpha/Llama2-Chinese/assets/15713149/682a7e05-6f02-4af2-961f-ced734a402f7)
-
### Prerequisites
- [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the [Discussions](https://git…
-
Because I heard that CUDA is not actually needed when CRAG is actually running. Is that so?
> there is no NIVIDA CUDA on the Mac Apple Silicon series computers.
errors:
```
Preparing metadat…
-
## setting
server command: mlc_llm serve mlc-llama2-7b-q4 --overrides "tensor_parallel_shards=2" --mode server
requst: request rate is 20 request/s
gpu: a40
## ❓ General Questions
10 request/…
-
When I run the python mainly the first agent works fine but when its time for the next agent do its task i come up with this error
python3 main.py …
-
1. offline serving
![image](https://github.com/vllm-project/vllm/assets/43260218/87e216b5-9064-4c2a-a021-cac08e22795d)
2. online serving(fastapi)
![image](https://github.com/vllm-project/vllm/ass…
-
参考 https://soulteary.com/2023/07/23/build-llama2-chinese-large-model-that-can-run-on-cpu.html
使用 Apple M2, 用最后的 docker `soulteary/llama2:runtime` 运行 `Chinese-Llama-2-7b-ggml-q4.bin`
```bash
main:…