llm-compression Search Results

614 results
for llm-compression

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

bitsandbytes-foundation/bitsandbytes #611

[Question] How to obtain the 4-bit inference speedup?

I tested the inference speed of LLaMa-7B with bitsandbutes-0.40 on A100-80G. I fonud that the speed of `nf4` has been greatly improved thah Qlora. However, the speed of `nf4` is still slower than `fp1…

ChenMnZ updated 2 months ago
9
huggingface/alignment-handbook #24

Training Interruptions and Epoch Skipping with 6 Billion Par…

I attempted to fine-tune a 6 billion parameter model using 8 A100 GPUs, but the training process encountered interruptions. On the first attempt, it stopped at 0.15 epochs, and on the second attempt, …

apt-team-018 updated 11 months ago
1
Canadian-Geospatial-Platform/similarity-engine #2

Explore solutions to deploy large language models in the AWS…

Current Lambda zip deployment has a size limit of 250mb, which limits the use of large pre-trained models in the similarity engine Lambda deployment on AWS Cloud Environment. After resesarch, I will e…

xinli-cai updated 4 months ago
4
openvinotoolkit/nncf #2527

[Good First Issue][NNCF]: Add INT8 weight compression confor…

### Context This issue proposes adding a test to the [post-training compression conformance suite](https://github.com/openvinotoolkit/nncf/blob/develop/tests/post_training/README.md) to verify that t…

alexsu52 updated 5 months ago
19
langchain-ai/langchain #11408

BooleanOutputParser expected output value error

### System Info Hi, I am using LLMChainFilter.from_llm(llm) but while running, I am getting this error: ValueError: BooleanOutputParser expected output value to either be YES or NO. Received Yes, …

ankur287 updated 6 months ago
6
Tebmer/Awesome-Knowledge-Distillation-of-LLMs #6

[One paper] A new way to perform KD (verified on BERT compre…

Thanks for the great work! I want to recommend a new method to perform KD: inherited weight. Name: Weight-Inherited Distillation for Task-Agnostic BERT Compression code: https://github.com/wutai…

wutaiqiang updated 6 months ago
2
run-llama/llama_index #13617

[Bug]: Troubleshooting Issues in LongLLMlingua RAG Demo afte…

### Bug Description The llamaindex RAG demo is no longer functioning properly due to significant changes in library calls after updating llamaindex to version 0.10. Could you help me troubleshoot whe…

190679163 updated 5 months ago
2
KimMeen/Time-LLM #100

multi GPU error

我用两张A6000 96GB和两张GV100 尝试运行LLama 模型，但是cuda报错单卡bert是能够正常运行，但是一旦切换到双卡就在soure_embedding前向传播部分开始报错 source embeddings = self.mapping_layer(self.word embeddings.permute(1, 0)).permute(1, 0) 报错如下，请问有碰见…

zhangtianhong-1998 updated 1 week ago
8
vllm-project/vllm #9120

[Bug]: Unsupported base layer: QKVParallelLinear when loadin…

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch…

fahadh4ilyas updated 1 week ago
2
freelawproject/courtlistener #440

Use Speech Recognition to Transcribe Oral Argument Audio

We currently have about 7500 hours of oral argument audio without transcriptions. We need to go through these audio files and run a speech to text tool on them. This would have massive benefits: - Ale…

mlissner updated 3 months ago
51

上一页 1...30 31 32 33 34 35 36...62 下一页

614 results for llm-compression

614 results
for llm-compression