streamingllm Search Results

112 results
for streamingllm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #1416

[Question] Weight-only quantization seems doesn't work: "Mis…

### System Info Google Colab with GPU T4 and CUDA 12.2. TensorRT-LLM version: 0.9.0.dev2024040200. Here is the [minimum reproducible notebook](https://colab.research.google.com/drive/1xAxZKYHx_Qq4g…

siahuat0727 updated 2 months ago
3
mit-han-lab/streaming-llm #43

Questions on the demo results

I would like to express my gratitude for your paper and code, which have been truly enlightening for me. I conducted the experiments following the instructions provided in the README. I would be grate…

BitCalSaul updated 8 months ago
2
mit-han-lab/streaming-llm #42

Question on intuition of "attention sink" and "alibi PE"

Hi, Thanks for the amazing work on streaming-llm. While reading the paper, I came up with this question on why applying "attention sink" also works with models with alibi position embedding. One o…

bowencohere updated 8 months ago
3
mit-han-lab/streaming-llm #12

Questions on "streaming-llm" Paper

Firstly, I'd like to express my appreciation for your insightful paper and the open-source 'streaming-llm'. Your approach and experiments are truly commendable. I hope you don't mind, I would really a…

llsj14 updated 9 months ago
2
mit-han-lab/streaming-llm #1

For LLMs already trained with window attention and BOS token

Nice work! I am wondering whether this attention sink magic is still needed for LLMs that has been already trained with window attention (e.g. [mistral](https://github.com/mistralai/mistral-src)). …

GeneZC updated 8 months ago
6
SillyTavern/SillyTavern #1278

[FEATURE_REQUEST] Trimming the chat history in chunks to spe…

### Have you searched for similar requests? Yes ### Is your feature request related to a problem? If so, please describe. llama.cpp has the feature to re-use the context window if the beginning of …

h3ndrik updated 7 months ago
31
QwenLM/Qwen #422

window-attn 默认是否开启?

### 是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this? - [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions ### 该问题是否在FAQ中有解答？ | Is there an existing…

ShadowTeamCN updated 9 months ago
10
mit-han-lab/streaming-llm #2

How do you feed long texts to a model?

I tried naively to add examples in https://github.com/mit-han-lab/streaming-llm/blob/main/data/mt_bench.jsonl, including examples with length of 4k tokens, without changing anything in the script. I r…

CorentinvdBdO updated 9 months ago
3
mit-han-lab/streaming-llm #8

Google Colab installation

Hi https://colab.research.google.com/drive/1YtXE_JKVntkGK14Yo9thjCjPMVzhA71d?usp=sharing Here is the colab, but it doesn't run in colab it stops after a while due to memory overload or something…

narita63755930 updated 8 months ago
10
NVIDIA/TensorRT-LLM #1397

What are the advantages of int8_kv_cache?

About int8_kv_cache I did some tests： > Test model is mistral-7b > My test inference code comes from `run.py`, supplementing runner.generate's time-consuming statistics，Added warm up code. > Input…

sirodeneko updated 2 months ago
6

上一页 1...6 7 8 9 10 11 12...12 下一页

112 results for streamingllm

112 results
for streamingllm