streamingllm Search Results

143 results
for streamingllm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #1870

vLLM results are better than trt with the same request

### System Info CPU x86_64 GPU NVIDIA L40 TensorRT branch: v0.10.0 CUDA: NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.4 ### Who can help? @kaiyux …

activezhao updated 3 months ago
7
NVIDIA/TensorRT-LLM #2126

How to add gemm_plugin int8

### System Info A100-PCIe-40GB Tensorrt-LLM-verison:0.11.0 ### Who can help? @Tracin ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An offici…

xiangxinhello updated 1 month ago
16
NVIDIA/TensorRT-LLM #1920

Qwen2-72B-Instruct-GPTQ-Int4 Conversion Success, Run Failure

### System Info NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.4 - GPU properties - GPU name: NVIDIA L20 - GPU memory size: 46068MiB - Libraries - Te…

linchpinlin updated 1 month ago
7
NVIDIA/TensorRT-LLM #1760

Unexpected error from cudaGetDeviceCount() error

### System Info I met a trtllm-build issue. GPU: RTX 3090 I followed official script of the below steps. 1. I ran the below code after installing nvidia container toolkit. ``` docker run -…

ljm565 updated 3 months ago
1
Infini-AI-Lab/TriForce #5

Questions about end2end time cost of the inference request

Hi, thanks for your great job for LLM decoding process. I tested the code and got the expected decoding speedup for llama2-7B, but it seems that the end2end time cost does not change too much? (61s ->…

littletomatodonkey updated 5 months ago
2
NVIDIA/TensorRT-LLM #1861

trtllm-build on GRID vGPU - nvml errors

### System Info - Host: VMware ESXi 7 - Host Nvidia drivers: 550.54.16 - VM CPU architecture: x86_64 - VM Nvidia drivers: 550.54.15 - VM OS: Ubuntu LTS 22.04 - Physical GPU: A100 - TensorRT-LLM…

edesalve updated 1 month ago
8
NVIDIA/TensorRT-LLM #1709

Llava multimodel example is giving segfault

### System Info EC2 instance: G5.48xl Nvidia driver: 535.161.08 Cuda: 12.2 commit 5d8ca2faf74c494f220c8f71130340b513eea9a9 Torch: 2.3.0 ### Who can help? @byshiue running into the issue with h…

buddhapuneeth updated 2 months ago
4
NVIDIA/TensorRT-LLM #1417

What are the suggested arguments to build an efficient engin…

I'm reading the manual here: https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md The scripts are so simple, do they ensure best performance? ``` python convert_checkpoint.py …

sleepwalker2017 updated 4 months ago
3
LostRuins/koboldcpp #550

ContextShift sometimes degrades output

I'm trying storywriting with KoboldCpp. At some point the story will get longer than the context and KoboldCpp starts evicting tokens from the beginning, with the (newer) ContextShift feature. Sometim…

h3ndrik updated 3 months ago
22
triton-inference-server/tensorrtllm_backend #453

tensorrt-llm serving performance is extreamly low for llama3…

### System Info PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS (x86_64) GCC version: (Ubuntu 11.4.0-…

RunningLeon updated 1 month ago
4

上一页 1...7 8 9 10 11 12 13...15 下一页

143 results for streamingllm

143 results
for streamingllm