inference-framework Search Results

nndeploy/nndeploy #85

[RoadMap] inference framework

推理子模块 - [ ] ir - [ ] 手动构图， - [ ] 自动构图， - [ ] 模型转换， - [ ] 模型解释， - [ ] 计算图构建， - [ ] 图优化， - [ ] 内存优化， - [ ] 高性能算子

LukeLIN-web updated 1 month ago

EleutherAI/lm-evaluation-harness #2525

Can't evaluate a gguf model using llama.cpp as inference fra…

# I wana evaluate the precision of the gguf model using llama.cpp as inference framework ## use these commands: ./llama-server -m /root/ICAS_test/models/Qwen-1_8B-Q8_0.gguf lm_eval --model gguf …

SurviiingZc updated 1 day ago

gautierdag/tokenizer-bench #1

Question about Reproducing Figure 4 - Inference Time vs Voca…

I am currently trying to reproduce the results shown in Figure 4 - Inference Time vs Vocabulary Size from your project. I have a couple of questions regarding the methodology used for this figure: …

wowfingerlicker updated 3 days ago

volcengine/verl #21

Basic Tutorial: Adding a New LLM Inference/Serving Backend

1. **Prerequisite:** Make sure the LLM Inference framework can be launched following the SPMD style. For example, the LLM inference script can be launched by `torchrun --standalone --nproc=8 offline_i…

PeterSH6 updated 3 days ago

Infini-AI-Lab/TriForce #11

Adapt to open source inference framework

Have you considered incorporating this work into an open source inference framework, such as vLLM?

Siegfried-qgf updated 2 months ago

opea-project/GenAIComps #831

[RFC] OPEA Inference Microservices Integration for LangChain…

# OPEA Inference Microservices Integration for LangChain This RFC proposes the integration of OPEA inference microservices (from GenAIComps) into LangChain [extensible to other frameworks], enabli…

avinashkarani updated 3 weeks ago

openvinotoolkit/openvino #27736

Looking forward to providing OpenVINO backend support for Ll…

### Request Description Llama.cpp is a very popular and excellent LLM/VLM inference deployment framework, implemented in pure C/C++, without any dependencies, and cross-platform. Based on SYCL and Vu…

Torinlq updated 5 days ago

TorchMoE/MoE-Infinity #31

Question about prefetch implementation.

Hi. I have a question regarding the prefetch implementation in your framework. As I understand it, prefetching and inference should ideally run concurrently in separate CUDA streams. I noticed t…

zzhbrr updated 5 days ago

huggingface/accelerate #3208

Multiple node inference

How to implement Accelerate to split the model into multiple Gpus placed on different nodes for inference, if not, what other frameworks can implement it?

DLCM-wrz updated 18 hours ago

InternLM/lmdeploy #2774

[Feature] qwen2 vl support the turbomind engine

### Motivation 1、The qwen2vl effect is the sota level in the open source model 2、lmdeploy is an excellent inference framework 3、So it's important to support turbomind ### Related resources _No re…

DexterGuo updated 5 days ago

1000+ results for inference-framework

1000+ results
for inference-framework