llm-compression Search Results

614 results
for llm-compression

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

xorbitsai/inference #1739

windows环境启动qwen2-instruct报错KeyError

windows环境启动qwen2-instruct报错KeyError，环境是win10，python3.11.9 qwen2-instruct启动参数是Transformers+pytorch+model size 72+quantization 8-bit。报错详细信息如下： 2024-06-28 15:39:55,950 xinference.api.restful_api …

BarryCui updated 2 months ago
8
meta-llama/llama-stack #6

RFC-0001 - Llama Stack

As part of the Llama 3.1 release, Meta is releasing an RFC for ‘Llama Stack’, a comprehensive set of interfaces / API for ML developers building on top of Llama foundation models. We are looking for f…

raghotham updated 3 days ago
34
MrNeRF/awesome-3D-gaussian-splatting #91

Reorganizing the research paper list

Due to the overwhelming number of published research papers, the list has become somewhat disorganized. As categories expand and mature, there's a clear need for more fine-grained organization. This d…

MrNeRF updated 5 months ago
28
huggingface/text-generation-inference #1881

SnapKV support

### Feature request https://github.com/FasterDecoding/SnapKV ### Motivation SnapKV: Cache compression technique for faster LLM generation with less compute and memory In a recent paper, authors …

icyxp updated 4 months ago
1
BerriAI/litellm #3964

[Feature]: Add flag to disable compression on upstream reque…

### The Feature Similar to #3958 / #3533, LiteLLM _might_ get a performance boost by disabling gzip on upstream LLM requests. See https://github.com/encode/httpx/discussions/2220#discussion-406389…

Manouchehri updated 4 months ago
1
hkust-nlp/llm-compression-intelligence #7

[Feature] Added support for LLM Compression Evaluation at Op…

First of all, thank you for your hard work in developing this evaluation method! We have now added support for the llm-compression dataset and Bits per Character calculation at OpenCompass. OpenCom…

acylam updated 5 months ago
5
vllm-project/vllm #5052

Running Vllm on ray cluster, logging stuck at loading

### Your current environment I have two machine 2*4090, I wanted to runner a model (eg gpt-neox-20b) using vllm on ray cluster, so i follow the documentation by making ray cluster on head ray star…

maherr13 updated 2 months ago
6
ollama/ollama #4710

s390x build ollama : running gcc failed

### What is the issue? ollama build fails on undefined llama references ``` # github.com/ollama/ollama /usr/local/go/pkg/tool/linux_s390x/link: running gcc failed: exit status 1 /usr/bin/ld: /t…

woale updated 1 month ago
8
openvinotoolkit/openvino_notebooks #2068

Unable to compile Llama3 8B model for NPU using Openvino (w…

While we try to compile Llama3 model (int8 and IR version) using Openvino Compile method, we end up with following error: **RuntimeError: Exception from src\inference\src\cpp\core.cpp:109: Excepti…

sujikarNStarx updated 4 months ago
2
intel/llm-on-ray #251

Finetune on Ray cluster with trial task failed

When performing `llm_on_ray-finetune --config_file llm_on_ray/finetune/finetune.yaml` to finetune, the following error occurs: ``` View detailed results here: /home/work/ray_results/TorchTrainer_20…

iodone updated 4 months ago
4

上一页 1...23 24 25 26 27 28 29...62 下一页

614 results for llm-compression

614 results
for llm-compression