efficient-inference Search Results

1000+ results
for efficient-inference

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

bitnami/charts #30649

[bitnami/vllm] feat: Add chart

### Name and Version bitnami/vllm 0.1.0 ### What is the problem this feature will solve? Add the helm chart for vllm - a high-throughput and memory-efficient inference and serving engine for …

zhekazuev updated 2 days ago
1
ultralytics/ultralytics #17535

Effective Techniques for Quantizing YOLO Models (v8, v11) to…

### Search before asking - [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…

pagalscientist updated 2 weeks ago
6
vllm-project/vllm #4625

[Feature]: MLA Support

### 🚀 The feature, motivation and pitch DeepSeek-V2 design **MLA (Multi-head Latent Attention)**, which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key…

chengtbf updated 1 week ago
11
ZHO-ZHO-ZHO/ComfyUI-YoloWorld-EfficientSAM #93

RuntimeError: invalid vector subscript

RuntimeError: invalid vector subscript [2024-10-22 15:13] 2024-10-22 15:13:07,066- root:393- ERROR- Traceback (most recent call last): File "H:\sd\comfy-torch-2.1.2+cu118\execution.py", line 323…

samen168 updated 1 month ago
1
NVIDIA/TensorRT-LLM #1140

[Feature Request] RelayAttention for efficient inference wit…

See the preprint [here](https://openreview.net/forum?id=G1hjFDre0NF). It will be useful for few-shot in-context learning scenarios, prefix-tuning finetuned models, and generally those LLM applicati…

rayleizhu updated 2 weeks ago
1
theodorblackbird/lina-speech #11

updating for the training and inference code

I have just read your paper "LINA-SPEECH: GATED LINEAR ATTENTION IS A FAST AND PARAMETER-EFFICIENT LEARNER FOR TEXT-TO-SPEECH SYNTHESIS" and I must say, I am truly amazed by the effectiveness of your …

ScottishFold007 updated 3 weeks ago
1
ReScience/submissions #88

[Re] Reproducibility study of "Optimizing Deep Learning Infe…

**Original article: Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection** **PDF URL: https://github.com/HitkoDev/energy-efficient-ml/blob/master/article.pdf** *…

DrejcPesjak updated 2 weeks ago
2
NVlabs/Sana #24

Wonderful Work! Adaptation for MacOS and Mobile Devices and …

### **Adaptation for MacOS and Mobile Devices** Given the model's relatively small parameter size and efficient performance, I was wondering if there are any plans to adapt it for MacOS devices with …

viiika updated 1 week ago
4
openvinotoolkit/openvino_notebooks #2533

(Optimization of LLM inference) Does Intel OpenVINO support …

Functional discussion for this project. [notebooks/llm-chatbot](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) Intel's official documentation: https://www…

hsulin0806 updated 5 days ago
2
oneapi-src/oneDNN #1788

GEMM API for efficient LLM inference with W8A16

I want to perform inference on quantized LLAMA (W8A16) on ARM-v9 (with SVE) using oneDNN. The LLAMA weights are per-group quantized. Based on my understanding, I need to prepack the weights to redu…

oleotiger updated 4 months ago
3

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for efficient-inference

1000+ results
for efficient-inference