llm-compression Search Results

614 results
for llm-compression

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

run-llama/llama_index #13447

[Bug]: MilvusVectorStore failed to connect non-localhost uri…

### Bug Description The `MilvusVectorStore` failed to connect non-localhost uri when `enable_sparse` is `True` ### Version 0.10.36 ### Steps to Reproduce For the codes ```python vector_store …

richzw updated 1 month ago
3
shadowpa0327/Palu #5

[Performance] Model Performance Degradation with Palu Compre…

Hello! I have been really excited about your work! I attempted to use Palu for model compression on the Qwen2 series models, but regardless of the compression rate I set, I seem to encounter signif…

jiangguochaoGG updated 2 weeks ago
2
vllm-project/llm-compressor #35

Mixtral 8*22B Quantization Failed with 2 issues

**Describe the bug** A clear and concise description of what the bug is. Hey Team, trying to quantize mistral 8*22b with W8A8 recipe and failed with two issues with different versions: 1) `…

qingquansong updated 2 months ago
21
AIoT-MLSys-Lab/SVD-LLM #9

Compressed Model Produces Random and Repetitive Output - Req…

Description When running the code, we successfully obtain a compressed model. However, when prompted with an input, the model generates random and repetitive outputs, often repeating the same letters…

codeit1792 updated 3 months ago
1
JupyPod/Codepod #1

Reliable exporting and importing for backup and restore

Corner case 1: The IPynb below is made in VSCode. But once imported into Codepod, 1. all source code is gone. 2. HTML is not properly rendered. ![image](https://github.com/RunVas/RunVas/ass…

forrestbao updated 1 year ago
1
vcskaushik/LLMzip #4

Questions about arithmetic coding

Thank you for your excellent work and code. I have a few questions. Regarding the arithmetic coding used, how did you determine the precision? Are you using infinite precision or finite precision?…

Jone-Luo updated 3 months ago
1
huggingface/transformers #27649

Adding support for lookahead decoding for autoregressive (de…

### Feature request Fu et al. propose a novel decoding technique that accelerates greedy decoding on Llama 2 and Code-Llama by 1.5-2x across various parameters sizes, without a draft model. This meth…

shermansiu updated 10 months ago
9
Borketh/hardqoi #4

Optimized decoder for WebAssembly

Feel free to simply close out this issue if you are not interested but we just implemented QOI image format for VNC to deliver lossless remote desktops using Rust WASM clientside here: https://githu…

thelamer updated 1 year ago
8
uds-lsv/multilingual-icl-analysis #1

Troubles on XQuAD dataset result

Hello, Thank you for sharing your implementation. It has been very helpful for me! :) I have a quick question. I cloned your implementation and obtained the following image results. Howeve…

sumyeongahn updated 2 weeks ago
19
flashinfer-ai/flashinfer #367

[Feature request] Sparse Attention

Recently, we see several awesome work focusing on kv cache compressing and they said can accelearte 1.7~2.3 times than FlashInfer, can you guys plz consider to surpport such features? Same layer KV…

Ageliss updated 3 months ago
5

上一页 1...18 19 20 21 22 23 24...62 下一页

614 results for llm-compression

614 results
for llm-compression