streamingllm Search Results

112 results
for streamingllm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #1490

chatglm2-6b smoothquant multi-tp build failed on 0.9.0 branc…

### System Info - CPU architecture: x86_64 - GPU properties - GPU name: NVIDIA A100 - GPU memory size: 40G - Libraries - TensorRT-LLM branch or tag: v0.9.0 -…

NaNAGISaSA updated 1 month ago
4
k2-fsa/sherpa #605

trtllm-build --checkpoint_dir ${checkpoint_dir}/decoder fail…

As followed README to build trtllm, i met an issue as below, please help me check it. Thank you! triton/whisper/README.md Seems like process being killed unexpectedly during converting encoder che…

evanxqs updated 1 month ago
1
NVIDIA/TensorRT-LLM #1417

What are the suggested arguments to build an efficient engin…

I'm reading the manual here: https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md The scripts are so simple, do they ensure best performance? ``` python convert_checkpoint.py …

sleepwalker2017 updated 1 month ago
3
NVIDIA/TensorRT-LLM #1784

[shapeMachine.cpp::executeContinuation::905] Error Code 7: I…

### System Info - GPU Name: NVIDIA GeForce RTX 3080 Ti - System Ram: 65GB ### Who can help? @ncomly-nvidia @byshiue ### Information - [ ] The official example scripts - [X] My own modified scri…

Naphat-Khoprasertthaworn updated 1 week ago
1
thunlp/InfLLM #20

Qwen1.5-7B-Chat

Qwen系列的模型，有对应的测试结果吗？ PS：默认的 yaml 配置跑 qasper 数据集会直接爆显存（NVIDIA A100-SXM4-80GB） ``` model: type: inf-llm path: /data/model/open_source_data/Qwen/Qwen1.5-7B-Chat block_size: 128 n_ini…

ChuanhongLi updated 3 months ago
10
NVIDIA/TensorRT-LLM #1908

Mixtral 8x7b to TRT fails with Assertion failed: Can't alloc…

### System Info - CPU : X86 - GPU: 8 X L40s - TRT LLM Version: "0.12.0.dev2024070200" - NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 followed https://nvi…

OrZipori updated 1 day ago
1
NVIDIA/TensorRT-LLM #1778

`Parameter transformer.layers.N.attention.embed_positions (1…

### System Info While trying to debug poor quality of outputs from TRT LLM for Llama3 70b tp=4 (compared to vLLM and HF), I ran into the following message when building bfloat16 engine. ``` [06…

DreamGenX updated 3 weeks ago
2
LostRuins/koboldcpp #550

ContextShift sometimes degrades output

I'm trying storywriting with KoboldCpp. At some point the story will get longer than the context and KoboldCpp starts evicting tokens from the beginning, with the (newer) ContextShift feature. Sometim…

h3ndrik updated 2 weeks ago
22
Franc-Z/QWen1.5_TensorRT-LLM #6

error when building the engine, post_layernorm weight not fo…

```bash python3 QWen1.5_TensorRT-LLM/convert_checkpoint.py --model_dir Qwen1.5-1.8B-Chat --output_dir Qwen1.5-1.8B-Chat-ckpt trtllm-build --checkpoint_dir ./Qwen1.5-1.8B-Chat-ckpt \ -…

zhangfuwen updated 1 month ago
2
NVIDIA/TensorRT-LLM #1433

can't save the engine when running triton-build

### System Info 3090 server ### Who can help? _No response_ ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An officially supported task in the …

YunChen1227 updated 2 months ago
7

上一页 1...4 5 6 7 8 9 10...12 下一页

112 results for streamingllm

112 results
for streamingllm