streamingllm Search Results

112 results
for streamingllm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #1770

Fail to build w4a8_awq on Llama 13b

### System Info ubuntu 20.04 tensorrt 10.0.1 tensorrt-cu12 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm …

Hongbosherlock updated 2 weeks ago
10
triton-inference-server/tensorrtllm_backend #487

Got repeated answer while deploying LLaMA3-Instruct-8B model…

### System Info CPU architecture: x86_64 CPU/Host memory size: 32G GPU properties: SM86 GPU name: NVIDIA A10 GPU memory size: 24G Clock frequencies used: 1695MHz ### Libraries TensorRT-LL…

AndyZZt updated 1 week ago
3
Zefan-Cai/PyramidKV #9

mistral7B运行错误

我修改eval.sh为如下 ``` export CUDA_VISIBLE_DEVICES=${1:-1} # 默认用1号线卡 # model_path=${2:-"meta-llama/Meta-Llama-3-8B-Instruct"} # meta-llama/Meta-Llama-3-8B-Instruct, mistralai/Mistral-7B-Instruct-v0…

monster119120 updated 2 weeks ago
2
huggingface/transformers #30926

Sink Cache Attention Scores are strange. CausalMask seems no…

### System Info - `transformers` version: 4.41.0 - Platform: Linux-5.15.0-67-generic-x86_64-with-glibc2.31 - Python version: 3.10.13 - Huggingface_hub version: 0.23.0 - Safetensors version: 0.4…

Tomorrowdawn updated 1 day ago
6
NVIDIA/TensorRT-LLM #1870

vLLM results are better than trt with the same request

### System Info CPU x86_64 GPU NVIDIA L40 TensorRT branch: v0.10.0 CUDA: NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.4 ### Who can help? @kaiyux …

activezhao updated 1 day ago
7
NVIDIA/TensorRT-LLM #1696

Faild to build llama3 70B when worker > 1 and tp_size=8

### System Info trt_llm 0.11.0.dev2024052800 trt 10.0.1 device A800 coda for Tensorrt_llm: latest version in main branch ### Who can help? @byshiue ### Information - [X] The official example s…

WDONG66 updated 1 day ago
4
NVIDIA/TensorRT-LLM #1760

Unexpected error from cudaGetDeviceCount() error

### System Info I met a trtllm-build issue. GPU: RTX 3090 I followed official script of the below steps. 1. I ran the below code after installing nvidia container toolkit. ``` docker run -…

ljm565 updated 4 weeks ago
1
NVIDIA/TensorRT-LLM #1109

[Mixtral 8x7B] trtllm-build | RuntimeError: Provided tensor…

I have sucessfully converted a Mixtral 8x7B model with tensor parallelism following this script from llama example folder : python convert_checkpoint.py --model_dir ./Mixtral-8x7B-v0.1 \ …

mfournioux updated 3 months ago
5
Infini-AI-Lab/TriForce #5

Questions about end2end time cost of the inference request

Hi, thanks for your great job for LLM decoding process. I tested the code and got the expected decoding speedup for llama2-7B, but it seems that the end2end time cost does not change too much? (61s ->…

littletomatodonkey updated 2 months ago
2
NVIDIA/TensorRT-LLM #104

Attention sink

Hi 👋 and thanks for the amazing job can’t wait to see the developments in the next few weeks and months. any plan to work on attention sink ?

jqueguiner updated 2 months ago
3

上一页 1...3 4 5 6 7 8 9...12 下一页

112 results for streamingllm

112 results
for streamingllm