inference-server Search Results

1000+ results
for inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #3348

inference with AWQ quantization

Hi, i got an anomaly while inference mistral with AWQ, below is the GPU usage on 3090 consume 20GB GPU. even if we inference the base model only consume 19GB GPU here is the command: python -m vl…

Kev1ntan updated 2 weeks ago
3
vllm-project/vllm #6060

[Bug]: Garbled Tokens appears in vllm generation result ever…

### Your current environment ```text tiktoken==0.6.0 transformers==4.38.1 tokenizers==0.15.2 vLLM Version: 0.4.3 fastchat Version: 0.2.36 ``` ### 🐛 Describe the bug Currently, I'm using fa…

Jason-csc updated 2 days ago
2
openvinotoolkit/openvino #27541

[Bug]: docker image openvino/model_server:latest-gpu does no…

### OpenVINO Version 2024.3 ### Operating System Windows System ### Device used for inference intel UHD Graphics GPU ### Framework None ### Model used meta-llama/Llama-3.2-3…

fedecompa updated 3 days ago
2
replicate/replicate-python #369

Inconsistent output between different flux model

Using flux schnell and flux dev output the image in base64 while black-forest-labs/flux-1.1-pro outputs directly the link to the image on replicate server. Is it normal ? Here is the used call for…

dulalbert updated 1 month ago
1
meta-llama/llama-stack #190

stack tool cannot support large models with a .pth extension…

The stack tool cannot support large models with a .pth extension downloaded from Meta. It throws an error during runtime. Does it have to use models downloaded from Hugging Face? Is this setup unreaso…

Itime-ren updated 1 month ago
2
langchain-ai/langchain #24571

langchain-huggingface: Using ChatHuggingFace requires hf tok…

### Checked other resources - [X] I added a very descriptive title to this issue. - [X] I searched the LangChain documentation with the integrated search. - [X] I used the GitHub search to find a…

avargasestay updated 2 weeks ago
7
meta-llama/llama-stack #242

I used the official Docker image and downloaded the weight f…

I used the official Docker image and downloaded the weight file from Meta. The md5sum test proved that the file was fine, but it still failed to run, which left me confused，I confirm that CUDA can be …

Itime-ren updated 1 month ago
2
vllm-project/vllm #9903

[Feature]: automatically release graphics card memory

### 🚀 The feature, motivation and pitch I use vllm.entrypoints.openai.api_user to start my large model, and the specific command is as follows: ```bash python3 -m vllm.entrypoints.openai.api_server…

turkeymz updated 2 weeks ago
1
Exafunction/codeium.nvim #208

How to close this kind of error log

How to close this kind of error log, sometimes the network is not good and it will be interrupted frequently ![Dingtalk_20240717155321](https://github.com/user-attachments/assets/9028ef4e-98a6-4f39-a…

Rehtt updated 3 weeks ago
1
apache/jena #2764

Fuseki UI: support inference options in dataset creation

### Version 5.1.0 ### Feature This is a very loose feature idea, it's not urgent or anything. It would be useful if when creating a new dataset in the Fuseki UI, the user would be presented with…

Ostrzyciel updated 1 week ago
5

上一页 1...13 14 15 16 17 18 19...100 下一页

1000+ results for inference-server

1000+ results
for inference-server