llm-cost Search Results

SaFoLab-WISC/AutoDAN-Turbo #10

High Computational Cost and Insights on LLM Safety

We are from the Human-Centered AI Lab at MBZUAI, and I must say this is an amazing piece of work. However, its only drawback is the high computational cost. A month ago, we used 48 A100 GPUs to reprod…

7ml updated 1 day ago

Future-House/paper-qa #712

How to control the count of relevant papers and the words of…

``I got 800 papers and I want the paperqa can read most of them to give me a answer. But the paperqa usually answer that with less than 30 paper and 15 relevant paper. So, the question is how can …

CGH20171006 updated 4 hours ago

alibaba/MNN #3091

Qwen2.5-0.5B-Instruct 8bit量化推理输出乱码

# 平台(如果交叉编译请再附上交叉编译目标平台): # Platform(Include target platform as well if cross-compiling): ubuntu 20.04 cuda 使用最新的3.0 MNN版本导出qwen2.5-0.5b模型，4bit量化正常，8bit量化输出乱码【无论是否修改"precision": "fp16"】。 #…

jfduma updated 6 hours ago

PublicDataWorks/verdad-frontend #127

[Backend] Monitor Audio Processing Pipeline and Generate Sum…

As a developer, I want to monitor the audio processing pipeline and generate a detailed summary report of processing statistics, including error analysis and LLM cost tracking, so that we can identify…

nhphong updated 1 week ago

deepset-ai/haystack #8540

Add a ranker component that uses an LLM to rerank documents

**Describe the solution you'd like** I’d like to add a new ranker component that leverages a LLM to rerank retrieved documents based on their relevance to the query. This would better assess the qual…

sjrl updated 5 hours ago

AkihikoWatanabe/paper_notes #1523

Understanding LLMs: A Comprehensive Overview from Training t…

# URL - https://arxiv.org/abs/2401.02038 # Authors - Yiheng Liu - Hao He - Tianle Han - Xu Zhang - Mengyuan Liu - Jiaming Tian - Yutong Zhang - Jiaqi Wang - Xiaohui Gao - Tianyang …

AkihikoWatanabe updated 2 days ago

microsoft/autogen #3513

[Issue]: Any good way to turn off the warning from autogen.o…

### Describe the issue I am running autogen with local LLMs. Is there any good way to turn off the warning from autogen.oai.client or any other warning like: [autogen.oai.client: 09-11 21:12:41] {…

MicDonald updated 2 weeks ago

mlc-ai/mlc-llm #2967

[Question] Why phi3.5v inference image is much longer than H…

## ❓ General Questions I tried to compile TVM and MLC-LLM on jetson orin AGX(jp6 cu122), in order to inference phi3.5v. However, I discovered phi3 processes images is much slower than hugging face …

Liuuuu54 updated 1 week ago

vllm-project/vllm #10086

[Feature]: Enhance integration with advanced LB/gateways wit…

### 🚀 The feature, motivation and pitch There are huge potential in more advanced load balancing strategies tailored for the unique characteristics of AI inference, compared to basic strategies such …

liu-cong updated 1 day ago

geekan/MetaGPT #1593

Updating the TOKEN_COSTS for Model Qwen/Qwen2.5-Coder-32B-In…

2024-11-12 17:30:29.957 | WARNING | metagpt.utils.cost_manager:update_cost:49 - Model Qwen/Qwen2.5-Coder-32B-Instruct not found in TOKEN_COSTS.

ZONGEZE updated 1 day ago

1000+ results for llm-cost

1000+ results
for llm-cost