efficient-inference Search Results

1000+ results
for efficient-inference

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

smallcloudai/refact #77

[bounty] CPU inference support, Mac M1/M2 inference support

There are several projects aiming to make inference on CPU efficient. The first part is research: - Which project works better, - And compatible with Refact license, - And doesn't bloat the dock…

olegklimov updated 2 months ago
45
flashinfer-ai/flashinfer #237

Support MLA (Multi-Head Latent Attention) in DeepSeek-v2

MLA(Multi-Head Latency Attention) was proposed in [DeepSeek-v2](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/deepseek-v2-tech-report.pdf) for efficient inference.

yzh119 updated 2 weeks ago
8
number9473/nn-algorithm #303

AutoFocus: Efficient Multi-Scale Inference

# AutoFocus: Efficient Multi-Scale Inference # - Author: Mahyar Najibi, Bharat Singh, Larry S. Davis - Origin: https://arxiv.org/abs/1812.01600 - Related: > This is 2.5X faster than our multi-…

joyhuang9473 updated 5 years ago
1
vllm-project/vllm #7690

[Feature]: Overlap model weight loading and model prefill

### 🚀 The feature, motivation and pitch For LLM inference, requests per second(QPS) is not constant. It needs launch vllm engine on demand. For elastic instance, it's significance to reduce TTFT(Time…

candyzone updated 1 week ago
8
pytorch/pytorch #117016

There is a performance drop because we have not yet implemen…

### 🚀 The feature, motivation and pitch I'm working on ensembling multiple UNet with the method mentioned in [MODEL ENSEMBLING ](https://pytorch.org/tutorials/intermediate/ensembling.html). This met…

5c4lar updated 2 days ago
3
Vchitect/VEnhancer #11

Approximate release date of MULTI-GPU inferencing

Hey guys great work with this. We were wondering if and (approximately) when you will be releasing the multi gpu inferencing. Furthermore what is the time taken with default settings to inference a 6 …

SamitM1 updated 2 months ago
18
rhymes-ai/Allegro #28

"No available kernel. Aborting execution." I installed all o…

(allegro) D:\PyShit\Allegro>python single_inference.py ^ More? --user_prompt "A seaside harbor with bright sunlight and sparkling seawater, with manyboats in the water. From an aerial view, the boats…

MinervaArgus updated 1 month ago
1
UKPLab/sentence-transformers #3039

Feature Request: Support for ONNX backend for CrossEncoders.

Recently, I noticed that the `SentenceTransformers` class has gained the ability to use the ONNX backend, which is incredibly beneficial for enhancing performance, especially on CPUs. I would like …

SupreethRao99 updated 3 weeks ago
1
xdit-project/xDiT #213

RoadMap and Looking for Contributions

## Model Zoo (we generally first implement USP and then PipeFusion for a new model) wait for your comments. ## Scheduler - [ ] Decouple VAE and DiT backbone. They can have different parallel …

feifeibear updated 3 days ago
2
HL-hanlin/Ctrl-Adapter #20

Is there any efficient settings to run the code on a single …

Hi @HL-hanlin , Thank you for you amazing work of Ctrl-Adapter! I was trying to run the code on a single NVIDIA 3090 GPU, but I came into the OOM error. Could you please enlighten me what GPU resou…

AlonzoLeeeooo updated 1 month ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for efficient-inference

1000+ results
for efficient-inference