-
I am using the `tritonserver:24.08-trtllm-python-py3` image for building and deploying the Llama-3.1-8B-Instruct engine.
In my attempt to serve the model with `tritonserver`, I got the following er…
-
### Your current environment
```text
PyTorch version: 2.3.1
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version:…
-
你好,我在尝试vllm推理时,运行你们给的inference_vllm.py,代码没改,但是50g显存为什么都会报错out of memory,我也不知道为什么,到底需要多少显存,我使用的是vllm(0.4.2)
INFO 08-25 08:17:00 llm_engine.py:100] Initializing an LLM engine (v0.4.2) with config: mode…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch…
-
### Your current environment
The output of `python collect_env.py`
```text
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A…
-
when I run the pipeline
```
python run_exp.py --method_name 'naive' \
--split 'test' \
--dataset_name 'nq' \
--gpu_id '0,1,2,3'
```
I get t…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch v…
-
### Your current environment
```text
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.10 (x86_64)
GCC version: (…
-
Whenever I try to watch a video that is 60fps and 720p or higher on youtube the video turns into a slideshow (the image freezes for like 4 to 5 seconds) if I have hardware acceleration, the audio play…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch…