-
### Your current environment
H100 (but I believe it happens in any machine)
### 🐛 Describe the bug
```
--enable-chunked-prefill --num-max-batched-tokens 2048 --kv-cache-dtype "fp8"
```
S…
-
##Version of crewai
```
crewai==0.28.8
crewai_tools==0.1.6
```
## Code implementation
```
from langchain_groq import ChatGroq
llm = ChatGroq(temperature=0, model_name="llama3-70b-8192")
f…
-
### ⚠️ This issue respects the following points: ⚠️
- [X] This is a **bug**, not a question or a configuration/webserver/proxy issue.
- [ ] This issue is **not** already reported on [Github](https…
-
### Your current environment
```text
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per c…
-
For `XX` in [13b, 13b-chat, 30b-v3, 30b-chat-v3]:
Check upon issue creation:
* [x] The model has not been evaluated yet and doesn't show up on the [CoT Leaderboard](https://huggingface.co/space…
-
**Describe the bug**
When attempting to compress the Meta-Llama/Llama-2-13b-chat-hf model to W8A8 using a combination of GPTQ and SmoothQuant algorithms on an NVIDIA A800 GPU with 80GB of VRAM, I enc…
-
此行代码会报错
```
File "/data4/kaisi/RETA-LLM/indexer/index_baichuan.py", line 116, in build_model
model = self.llm.llm_engine.workers[0].model
AttributeError: 'Worker' object has no attribute 'mo…
-
报错信息:
> Traceback (most recent call last):
> File "/xx/MiniCPM-V/finetune/finetune.py", line 124, in
> train()
> File "/xx/MiniCPM-V/finetune/finetune.py", line 119, in train
> tra…
Zmeo updated
3 months ago
-
### Your current environment
Idk how to run it inside a docker
### 🐛 Describe the bug
Simply run the following command
`docker run --runtime nvidia --gpus all -v ~/.cache/huggingface:/root/.ca…
-
I use grpc server multithreading to do infer,but get error as fllowing
File "/usr/local/lib/python3.8/site-packages/vllm/entrypoints/llm.py", line 130, in generate
return self._run_engine(us…