-
The server seems to be ok with the following log.
```
I1212 03:29:51.067415 37860 server.cc:674]
+----------------+---------+--------+
| Model | Version | Status |
+----------------+---…
-
Hi
The issue: with `--swap-space X` specified, as soon as CPU KV cache is used, vLLM stops all processing. CPU and GPU usage go to 0%, and the request never returns. Any future requests are also n…
-
I found huge memory leaks, caused by plugin.
To reproduce:
1) Add single CPathVolume
2) Add Timer in any blueprint to fire event connected to Find Async Path with Volume ref connected. To get l…
-
### Describe the bug
with demo run_llama_int8.py, setting generate_kwargs["do_sample"] to be True, I got the error as follows:
command:
python run_llama_int8.py -m ${MODEL_ID} --quantized-model-…
-
### Search before asking
- [X] I had searched in the [issues](https://github.com/eosphoros-ai/DB-GPT/issues?q=is%3Aissue) and found no similar issues.
### Operating system information
Linux
### P…
-
### System Info
CPU x86_64
GPU NVIDIA L40
TensorRT branch: v0.10.0
CUDA: NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2
### Who can help?
@Tracin
### Inf…
-
Creating this issue to initiate discussions about supporting vector embeddings in Pinot.
This [write-up](https://docs.google.com/document/d/1aiXPbwK4rU_YdfMPt3K752SuCMy8KQehqM4ltPg9juE/edit) collat…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A
OS: Ubuntu …
-
### What is the issue?
This is on AMD. I have 2 x Radeon 7900 XCX cards (24gb each).
For models/memory use that only uses 1 GPU, everything works fine.
As soon as both cards are required, the inf…
-
`2024-05-24 23:49:38 WARNING 05-24 15:49:38 utils.py:327] Not found nvcc in /usr/local/cuda. Skip cuda version check!
2024-05-24 23:49:38 INFO 05-24 15:49:38 config.py:379] Using fp8 data type to sto…