llm-serving Search Results

1000+ results
for llm-serving

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/server #7316

When the request is large, the Triton server has a very high…

**Description** I run benchmark of Meta-Llama-3-8B-Instruct in RTX 8*4090, ![image](https://github.com/triton-inference-server/server/assets/68674291/1a0fd341-8d8f-4893-973c-ed1ed3b74aca) when r…

Godlovecui updated 1 month ago
1
eclipse-openj9/openj9 #19552

TriagerX - AI-Assisted Issue Triage and Assignment

Objective: TriagerX is a novel AI-enabled software analytics tool that we developed via the IBM CAS project (with Dr. Uddin). TriagerX aims to assign an issue to components/teams and developers and to…

llxia updated 1 month ago
1
continuedev/continue #1483

Intellij IDE freeze when the upstream URL returns a 502 (mod…

### Before submitting your bug report - [ ] I believe this is a bug. I'll try to join the [Continue Discord](https://discord.gg/NWtdYexhMs) for questions - [ ] I'm not able to find an [open issue](ht…

zoobab updated 1 week ago
1
kserve/kserve #3109

Thoughts on adding some LLM specific runtime?

### Discussed in https://github.com/kserve/kserve/discussions/3097 Originally posted by **Peilun-Li** August 25, 2023 There are some emerging serving runtimes dedicated to LLM hosting, e.g., v…

yuzisun updated 9 months ago
2
lm-sys/FastChat #3017

When the type of context in the incoming messages is text, a…

### When the type of context in the incoming messages is text, an error occurs. **API**: `/v1/chat/completions` ### request ```json { "max_tokens": 0, "model": "qwen-72b-chat-int4"…

icowan updated 1 month ago
1
vllm-project/vllm #5600

[Feature]: support Qwen2 embedding

### 🚀 The feature, motivation and pitch in the Mteb leaderboard, the current best embedding model is `Alibaba-NLP/gte-Qwen2-7B-instruct`. However, using the embedding endpoint on it returns the foll…

DavidPeleg6 updated 1 week ago
3
reorproject/reor #318

Windows install can't download Ollama

**Describe the bug** downloadOllama.js windows absolute url 404's. **To Reproduce** + First time installing - installed as admin + Open Reor and immediately get this error message: ``` Error: …

zpar-ky updated 2 days ago
3
vllm-project/vllm #2794

Multi GPU ROCm6 issues, and workarounds

I ran into a series of issues trying to get VLLM stood up on a system with multiple MI210s. I figured I'd document my issues and workarounds so that someone could pick up the baton later, or at least …

BKitor updated 4 days ago
7
irthomasthomas/undecidability #645

LoRA Land: Fine-Tuned Open-Source LLMs that Outperform GPT-4…

- [ ] [LoRA Land: Fine-Tuned Open-Source LLMs that Outperform GPT-4 - Predibase - Predibase](https://predibase.com/blog/lora-land-fine-tuned-open-source-llms-that-outperform-gpt-4) # LoRA Land: Fine…

irthomasthomas updated 4 months ago
1
run-llama/llama_index #14424

[Question]: multiple-PDF with local LLM llama3

### Question Validation - [X] I have searched both the documentation and discord for an answer. ### Question Hello team, I use Ollama service to handle LLM server, and I use **Llama 3** I used the…

yanwun updated 1 week ago
3

上一页 1...8 9 10 11 12 13 14...100 下一页

1000+ results for llm-serving

1000+ results
for llm-serving