llm-compression Search Results

508 results
for llm-compression

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Aaronhuang-778/BiLLM #14

Issue with code replication

Hello, first and foremost, I want to thank you for your incredible work! I'd like further information on how to reproduce your code. I followed the code instructions in your README, but I am unabl…

Devy99 updated 1 month ago
2
langchain-ai/langchain #16819

When I use ConversationalRetrievalChain.from_llm to implemen…

### Checked other resources - [X] I added a very descriptive title to this issue. - [X] I searched the LangChain documentation with the integrated search. - [X] I used the GitHub search to find a sim…

whm233 updated 3 weeks ago
5
AkihikoWatanabe/paper_notes #1288

Compression Represents Intelligence Linearly, Yuzhen Huang+,…

# URL - https://arxiv.org/abs/2404.09937 # Affiliations - Yuzhen Huang, N/A - Jinghan Zhang, N/A - Zifei Shan, N/A - Junxian He, N/A # Abstract - There is a belief that learning to compress …

AkihikoWatanabe updated 3 months ago
1
huggingface/transformers #31474

Quantization support for heads and embeddings

### Feature request Hi! I’ve been researching LLM quantization recently ([this paper](https://arxiv.org/abs/2405.14852)), and noticed a potentially improtant issue that arises when using LLMs with 1-…

galqiwi updated 6 days ago
13
run-llama/llama_index #13447

[Bug]: MilvusVectorStore failed to connect non-localhost uri…

### Bug Description The `MilvusVectorStore` failed to connect non-localhost uri when `enable_sparse` is `True` ### Version 0.10.36 ### Steps to Reproduce For the codes ```python vector_store …

richzw updated 2 months ago
3
lyogavin/airllm #117

Is it possible to use AirLLM with a quantized input model?

Hi there! Thanks for this amazing library. I was able to run a 70B model on my M2 Macbook Pro! I was able to get about one token every 100 seconds, which is almost good enough for my overnight task…

Verdagon updated 3 months ago
3
ggerganov/ggml #240

SpQR compression method

How feasible to implement spQR into ggml? SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

JianbangZ updated 7 months ago
2
deepseek-ai/DeepSeek-V2 #7

How to deploy in VLLM?

ZHENG518 updated 2 months ago
10
microsoft/LLMLingua #104

[Bug]: Compression truncates words and sentences

### Describe the bug I used the code in the README and also in the notebook. Check the code below. ### Steps to reproduce ```python from langchain_community.document_loaders import TextLo…

younes-io updated 4 months ago
3
AutoGPTQ/AutoGPTQ #136

support AWQ: Activation-aware Weight Quantization for LLM Co…

awq is the sota quantization method. Currently, as a result of my confirmation, I think it is easy to add awq to autogptq because the quantization storage method is the same as gptq. https://githu…

qwopqwop200 updated 1 year ago
16

上一页 1...1 2 3 4 5 6 7...51 下一页

508 results for llm-compression

508 results
for llm-compression