inference-acceleration Search Results

1000+ results
for inference-acceleration

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/MInference #40

[Question]: How does VLLM use MInference through OpenAI Comp…

### Describe the issue Can I run "python -m vllm.entrypoints.openai.api_server" to load MInference capabilities in VLLM?

jueming0312 updated 2 months ago
2
mkang315/ASF-YOLO #11

请问有做过ASF-YOLO的tensorrt的推理加速吗

请问有做过ASF-YOLO的tensorrt的推理加速吗

yuzhongx updated 3 months ago
3
mlc-ai/mlc-llm #2769

[Feature Request] Lookahead Decoding support

## 🚀 Feature Please add Lookahead Decoding in mlc-llm in C++, we needed it to speedup LLM decoding on **mobile device.** refers to: https://github.com/hao-ai-lab/LookaheadDecoding ## Motivation …

MrRace updated 2 months ago
3
evilsocket/cake #20

Thanks for the FOSS! Suggestion for future possible backend…

Thanks for the FOSS! Suggestion for future possible backends runtimes: Vulkan, OpenCL, SYCL/OpenVino/intel GPU, AMD gpu/ROCm/HIP. Vulkan and OpenCL both have the possibility of being very port…

ghchris2021 updated 3 months ago
2
microsoft/onnxruntime #22242

[Performance] fp16 support and performance

### Describe the issue FP16 model inference is slower compared to FP32. Does FP16 inference require additional configuration or just need to convert the model to FP16 ### To reproduce convert onnx …

cbingdu updated 3 weeks ago
3
mudler/LocalAI #3367

Can't start LocalAI (with REBUILD) on Xeon X5570 - Unwanted …

**LocalAI version:** Using Docker image: `localai/localai:latest-aio-gpu-hipblas` **Environment, CPU architecture, OS, and Version:** - Ubuntu 22.04 - Xeon X5570 [Specs](https://ark.intel.c…

chris-hatton updated 2 months ago
1
YvanYin/Metric3D #149

Running inference on CPU

Hi I was wondering if there was any support for CPU inferences. The sample script from hubconf.py doesn't run even if after all the code instructing tensors and models to move to cuda were removed per…

AD-lite24 updated 1 month ago
3
QwenLM/Qwen2-VL #98

How do I run the Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4 model?

``` from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor, AutoConfig from qwen_vl_utils import process_vision_info import torch model_name = "Qwen/Qwen2-VL-7B-I…

satorugojos updated 4 weeks ago
4
jafioti/luminal #61

[feature suggestion] self speculative decoding

Good morning(or afternoon/ evening)! There is a methodology called **self speculative decoding** among the techniques to enhance the speed of LLM inference. Would it be possible to implement this …

NewBornRustacean updated 4 months ago
7
nod-ai/SHARK-Studio #2172

[TRACKER] SHARK Studio Roadmap

## SHARK Studio Roadmap This project establishes and tracks a plan for phased releases of the SHARK Studio WebUI. There are three objectives of this roadmap: - Define product features, support…

monorimet updated 1 week ago
2

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for inference-acceleration

1000+ results
for inference-acceleration