inference-acceleration Search Results

1000+ results
for inference-acceleration

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ggerganov/ggml #542

[Question] What is the status of Vulkan backend?

Vulkan may be not the the best/fastest/easiest or so solution for inference, but is probably most portable GPU acceleration approach. Is anyone working actively to add support for it? And if so wha…

DanielMazurkiewicz updated 7 months ago
12
mlcommons/GaNDLF #639

Better code profiling

**Is your feature request related to a problem? Please describe.** Currently, all the code profiling is custom based on internal functions from psutil and/or pytorch. It would be great to have a more…

sarthakpati updated 1 month ago
9
pytorch/ao #134

2:4 sparsity + PTQ(int8) model's inference

Are there any runnable demos of using Sparse-QAT/PTQ (2:4) to accelerate inference, such as applying PTQ to a 2:4 sparse LLaMA for inference acceleration? I am curious about the potential speedup rati…

RanchiZhao updated 5 months ago
7
ali-vilab/UniAnimate #32

Any idea to speedup?

Thanks for your interesting work! I have tried to infer a 75 frames video on A100 in 512*768, which will take about 3mins. I also tried to use more cards, however, it only generates more videos :( . D…

adf1178 updated 2 months ago
4
facebookresearch/maskrcnn-benchmark #998

Trace model to onnx and inference using ONNXRuntime which ca…

Does anyone have tried export model into onnx and inference using ONNXRuntime?

lucasjinreal updated 5 years ago
3
immich-app/immich #13596

"Empty Recycle Bin" does not delete assets

### The bug Click on "Empty Recycle Bin" in the "Recycle Bin" section, and it will prompt "0 items have been permanently deleted", making it impossible to empty the Recycle Bin. ![image](https://g…

pyccl updated 3 days ago
3
Mikubill/sd-webui-controlnet #2958

[Question]: Why does the effect of diffusers not match yours…

### Is there an existing issue for this? - [X] I have searched the existing issues and checked the recent builds/commits of both this extension and the webui ### What happened? Thank you for this p…

xddun updated 1 week ago
1
intel/intel-npu-acceleration-library #109

[Llama3.1 8B] Need pass your input's `attention_mask` to obt…

**Describe the bug** While modified llama3 to llama3.1 as "meta-llama/Meta-Llama-3.1-8B-Instruct". The model can be managed to download. However it prompt error while sending the input. > The a…

ChenYuYeh updated 1 month ago
6
vllm-project/vllm #3468

[Feature]: DeepSpeed-FP6: An Optimization Approach from Micr…

### 🚀 The feature, motivation and pitch DeepSpeed-FP6: An Optimization Approach from Microsoft ### Alternatives Microsoft recently proposed an optimization approach called DeepSpeed-FP6. While it c…

jackdiy updated 7 months ago
1
ggerganov/llama.cpp #9086

Feature Request: Tensor Parallelism support

### Prerequisites - [X] I am running the latest code. Mention the version if possible as well. - [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.…

ClarkChin08 updated 1 week ago
4

上一页 1...5 6 7 8 9 10 11...100 下一页

1000+ results for inference-acceleration

1000+ results
for inference-acceleration