llm-serving Search Results

1000+ results
for llm-serving

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #10123

[Usage]: Engine iteration timed out. (during using qwen2-vl-…

### Your current environment ```text The output of `python collect_env.py` ``` ### How would you like to use vllm I I tried deploying `qwen2-vl-7b` using vllm with commands: ```bash VLLM_WORK…

HuiyuanYan updated 3 weeks ago
1
triton-inference-server/client #562

make cc-clients: Could not find requested file: RapidJSON-ta…

cmake is not successful ``` ❯ cmake --version cmake version 3.21.0 CMake suite maintained and supported by Kitware (kitware.com/cmake). ``` ``` mkdir build cd build cmake -DCMAKE_INSTA…

hayleyhu updated 1 week ago
3
casys-kaist/LLMServingSim #3

Error: No such file or directory

Hello! I use this simulator for LLM serving, but when I run the following cmd: ```shell python3 -u main.py --model_name 'gpt3-6.7b' --npu_num 1 --npu_group 1 --npu_mem 24 --dataset 'dataset/share-gp…

lhpp1314 updated 2 months ago
1
deepjavalibrary/djl #3387

Adding ignore_eos_token support in Chat Completions API Sche…

## Description ignore_eos_token is commonly used additional parameter to help standardize LLM benchmarks by forcing the requests to generate a consistent output seq len. -Will this change the c…

jiahong-liu updated 3 months ago
1
longy2k/obsidian-bmo-chatbot #113

LM Studio returns Unexpected endpoint or method.

I can't seem to get this extension to work with LM Studio. I've successfully used my server with other software, so I know the server works. I have CORS enabled. I'm serving on the local network. I'v…

skunkmonkey updated 1 month ago
1
SNU-ARC/any-precision-llm #7

No real speedup from any-precision-llm kernels

Hello, Similarly to #3, I've tried reproducing the `demo.py` benchmark on an H100 and an A6000 and I'm also seeing no speedup on these platforms at lower precisions. It was mentioned this is du…

pgimenes updated 2 months ago
2
deepjavalibrary/djl-serving #2498

TensorRT-LLM(TRT-LLM) LMI model format artifacts not found w…

## Description (A clear and concise description of what the bug is.) Model artifacts are in the (TRT-LLM) LMI model format: ` aws s3 ls *** PRE 1/ 2024-10-25 14:59:…

joshight updated 4 weeks ago
1
deepjavalibrary/djl-serving #1785

question to error model conversion process failed

## Description djl-serving version: djl-inference:0.26.0-tensorrtllm0.7.1 models: - meta-llama/Llama-2-7b-chat see: https://huggingface.co/meta-llama/Llama-2-7b-chat (used this report) - meta-lla…

geraldstanje updated 2 months ago
1
kserve/kserve #3561

Native integration with KEDA for LLM inference autoscaling

/kind feature **Describe the solution you'd like** To autoscale LLM inference services Knative's request level metrics may not be the best scaling metrics as LLM inference is performed at the toke…

yuzisun updated 2 weeks ago
5
NVIDIA/TensorRT-LLM #1141

Fail to run BLIP2-T5 in multiple A30 TP

### System Info - It worked when following https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/multimodal/README.md to run BLIP2-T5 XXL in single A100 GPU - However, I have only A30 for servin…

Titanpku updated 1 week ago
1

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for llm-serving

1000+ results
for llm-serving