llama-inference-server Search Results

1000+ results
for llama-inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

janhq/jan #3508

feat: Jan supports most llama.cpp params

## Goal - Jan supports most llama.cpp params ## Tasklist **Cortex** - [x] https://github.com/janhq/cortex.cpp/issues/1151 **Jan** - [ ] Update Right Sidebar UX for Jan - [ ] Enable Jan's API serv…

imtuyethan updated 3 days ago
1
flexflow/FlexFlow #1491

Error when I use larger batch size for spec-infer

The spec-infer works well for batch size (1,2,4,8,16). But I change the batch size to 32, it turns out to be "stack smashing detected" ```+ ncpus=16 + ngpus=1 + fsize=30000 + zsize=60000 + max_se…

lhr-30 updated 3 weeks ago
1
yzy-bupt/LDRE #1

LLama2-70B

由于网络以及权限问题，我们无法在reasoning_and_editing.py调用GPT-3.5-Turbo，请问能否提供使用LLama2来进行标题编辑生成的代码呢？

Pefect96 updated 4 weeks ago
4
meta-llama/llama-stack-apps #30

Error: Failed to initialize the TMA descriptor 801

Good day everyone, I am trying to run llama agentic system on RTX4090 with FP8 Quantization for the inference model and meta-llama/Llama-Guard-3-8B-INT8 for the Guard. WIth sufficiently small max_seq_…

anret updated 2 months ago
2
janhq/models #47

bug: Unable to chat with image using Moondream2 Vision model

### Jan version 0.5.4 ### Describe the Bug I can successfully load the model for chats, but as soon as I send an image, it crashes. Context: - I created a model.json to download the text an…

louis-jan updated 1 week ago
1
meta-llama/llama-stack #372

Usage of remote:vllm

What I understand about this is actually deploy a model (e.g Llama3.1-70B-Instruct) by using 'vllm serve Llama3.1-70B-Instruct ... ' and then config the url and model name to llama-stack for LLM capab…

TurboMa updated 9 hours ago
2
gpustack/gpustack #376

Multi-model simultaneous inference, GPUStack’s reception of …

**Describe the bug** Messages breaks and the inference doesn't complete: ![中断](https://github.com/user-attachments/assets/ea06ceee-f49b-4770-b5f6-da0946f73436) **Steps to reproduce** 1. Create…

linyinli updated 2 weeks ago
1
vllm-project/vllm #9535

[Performance]: bitsandbytes quantization slow

### Proposal to improve performance Improve bitsandbytes quantization inference speed ### Report of performance regression I'm testing llama-3.2-1b on a toy dataset. For offline inference using the…

lance0108 updated 2 weeks ago
8
intel-analytics/ipex-llm #11914

ipex Llama.cpp server fails with Phi3 models

Hi, I've trying to serve different Phi3 models using the Llama.cpp server that is created by the init-llama-cpp ipex. When I server with this version I have two problems: 1) The server doesn…

hvico updated 2 months ago
2
nod-ai/SHARK-Platform #402

Find More General and Easier to use Alternative For Compilin…

Discussion for this in #373 and #284. The export script in sharktank was built specifically for llama 3.1 models and has some rough edges. Along with this, it requires users to chain together cli c…

stbaione updated 2 days ago
5

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for llama-inference-server

1000+ results
for llama-inference-server