inference-optimization Search Results

1000+ results
for inference-optimization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/server #7594

GPU memory is not released by Triton

**Description** Triton does not clear or release GPU memory when there is a pause in inference. In the attached diagrams the same model is being used. It is served via ONNX. ![image (1)](https:…

briedel updated 6 days ago
1
abetlen/llama-cpp-python #1203

[Implement Optimization] Skip Inference for Predefined Token…

**Problem** I need to create a lot of small JSONs with a LLM. To do so I started with [Jsonformer](https://github.com/1rgs/jsonformer). However, since this is not maintained anymore and my colleagu…

Garstig updated 6 months ago
3
PKU-YuanGroup/Open-Sora-Plan #383

A request

Thank you very much for your open source contribution, which is very helpful for my current work! But I encountered some problems. In version 1.2, when inferring the 93x480p version with an A800 80G …

libaolu312 updated 4 weeks ago
2
BNL-DAQ-LDRD/NeuralCompression #11

Test inference performance optimization on CPU

blackcathj updated 2 years ago
2
mlfoundations/open_clip #901

fastpath inference is not supported

I found the fastpath inference seems not supported, which is an optimization in https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/transformer.py#L224. May I know what is the rea…

bigtree2020 updated 2 months ago
2
JuliaLang/julia #55073

OpaqueClosure constructor does not survive pre-compilation

```julia module Foo using Base.Experimental: @opaque some_method(x) = 2x make_oc() = @opaque (x::Int)->some_method(x) precompile(make_oc, ()) end # module Foo ``` When using this, I …

topolarity updated 2 months ago
1
intel/intel-npu-acceleration-library #82

LLM-7B performance

Hi, I tried to run the 7B INT4 LLM model on the NPU, but I found that the performance was not very good, only about 2 tokens/s. I found that maybe one of the reasons is that the NPU has been loading …

xduzhangjiayu updated 2 months ago
2
wepe/MachineLearning #63

Machine Learning Sênior - Vaga Internacional

Nossa empresa Company: US-Based 💵 Annual Compensation: $100k - $140k USD (Approx. R$550k - R$750k) Descrição da vaga 🔍 Responsibilities: Build tools to monitor inference infrastructure performan…

AlinneRecruiter updated 1 month ago
1
microsoft/Olive #1283

Getting KeyError: 'input_model' when trying to optimize whis…

**Describe the bug** Unable to optimize a model with device- cpu and precision int8. Ending up with KeyError: 'input_model' error **To Reproduce** Start with this example: https://github.com/micr…

MayuraRam updated 1 month ago
1
FunAudioLLM/CosyVoice #75

推理速度很慢

目前测试推理速度很慢，用的 A100 推理，推理生成 1 分钟的音频，有时推理时间能达到接近 1 分钟。请问有什么优化的方法麽？

zhusy09 updated 2 weeks ago
25

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for inference-optimization

1000+ results
for inference-optimization