inference-optimization Search Results

1000+ results
for inference-optimization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

FunAudioLLM/CosyVoice #233

Inquiry about deployment.

1、Could you please tell me how long it is expected to take to generate audio for 3 seconds after the new features update in August? Additionally, what is the technology behind SDPA for RTF optimizatio…

QianguoS updated 1 week ago
2
binmakeswell/ColossalChat #1

Energon-AI Inference Optimization

Does it Energon-AI support this project for inference optimization?

sharlec updated 1 year ago
1
yazaldefilimone/ecmacore.rs #2

Type Inference and Optimization Opportunities for JavaScrip…

The focus is to implement type inference for JavaScript code and identify potential optimization opportunities based on the inferred types. While maintaining correctness, the goal is to explore variou…

yazaldefilimone updated 4 months ago
1
OpenBMB/MiniCPM-V #555

[vllm] -

### 起始日期 | Start Date 9/3/2024 ### 实现PR | Implementation PR _No response_ ### 相关Issues | Reference Issues _No response_ ### 摘要 | Summary When using vLLM to optimally utilize GPU space for faste…

WoutDeRijck updated 1 week ago
1
microsoft/onnxruntime #21156

[Performance] Failed to run Whisper inference after optimiza…

### Describe the issue I exported my medium Whisper model correctly. It could run the inference with the correct answer. After that, I optimized my model. I ran the command line: `python -m onnxrunti…

XciciciX updated 1 month ago
2
ZiqiaoPeng/SyncTalk #55

Real-time inference

Is SyncTalk suitable for real-time inference? Are there any stats about latency anf performance? Are there any benchmarks or optimization tips for real-time use?

a-ghorbani updated 3 days ago
10
NVIDIA/TensorRT #4023

The KL divergence calculation is very slow and is not optimi…

## Description I tried to quote the following documents directly，tools/pytorch-quantization/pytorch_quantization/calib/histogram.py，and Use HistogramCalibrator.compute_amax() to calculate the max…

yychen2000 updated 1 month ago
3
balisujohn/tortoise.cpp #18

Optimize GPT2 inference: Remove redundant `autoregressive_la…

Thank you for this excellent implementation. I'd like to suggest an optimization that could significantly speed up inference and enable streaming output. Currently, there are two GPT2 graphs: 1.…

candlewill updated 2 months ago
1
QwenLM/Qwen2-VL #180

Suggestion to Reduce Memory Usage During Inference

# Problem Description In the Prefill stage (i.e., when outputting the first token), calculating logits for all token positions causes significant memory waste. With a vocabulary size of 152,064, the …

imczxx updated 5 days ago
1
brian-h-wang/LDLS #4

Inference time and optimization

@brian-h-wang hi i have few queries on the inference time Q1. in you notebook we get timing "363 ms ± 21.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)" what does this statement infer Q2.…

abhigoku10 updated 4 years ago
4

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for inference-optimization

1000+ results
for inference-optimization