llm-serving Search Results

1000+ results
for llm-serving

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/server #7210

Questions about serving PyTorch LLM in Python backend with t…

I am planning to use Triton's Python backend to serve a LLM model in Pytorch; and more specifically, I want to implement token streaming and hence based on the suggestions I read here [https://github.…

jackylu0124 updated 6 months ago
4
BalterLoadTesting/balter #10

[question] measuring time to first byte?

With the rise of APIs that use server-sent events (SSE) like ChatGPT, it is becoming more and more common to want to load test and measure time-to-first-byte (TTFB). For example, TTFB can be a prox…

lukehsiao updated 6 months ago
2
ml-explore/mlx-examples #660

Command-R-Plus, Context Window Limitations

Cohere's new Command-R-Plus model reportedly features a 128k context window. However, testing with progressively longer prompts reveals it begins producing nonsensical output (e.g., "\\...") after 819…

jeanromainroy updated 6 months ago
42
malfoyslastname/character-card-spec-v2 #1

General discussion & summary of project owner positions

Use this thread for general discussion and debate regarding the Character Card Spec V2. **Anyone** may freely use this thread to discuss the spec. However, if you are an owner or representative for a …

malfoyslastname updated 3 weeks ago
37
InternLM/lmdeploy #2049

[Bug] 无法使用双卡的显存来共同加载一个模型

### Checklist - [X] 1. I have searched related issues but cannot get the expected help. - [X] 2. The bug has not been fixed in the latest version. ### Describe the bug 我想用 2 张 4090 来部署 Qwen/…

keakon updated 4 months ago
10
SovereignCloudStack/standards #366

[Standardization] GPU naming convention needs further refine…

GPU name in [SCS Flavor Naming Standard](https://github.com/SovereignCloudStack/standards/blob/main/Standards/scs-0100-v3-flavor-naming.md#optional-gpu-support) need further refinement. The following …

anjastrunk updated 3 weeks ago
14
modal-labs/modal-examples #763

Error when Running `vllm_inference.py`: `CancelledError()

I have encountered an issue when attempting to run the `vllm_inference.py` script from the Modal Examples repository. Below are the steps I followed and the error I encountered: ### Steps to Reprod…

chaosisnotrandomitisrhythmic updated 5 months ago
5
langflow-ai/langflow #2886

Flow returns cached output from playground

### Bug Description After making the flow, and use Playground to test, the API calling will return the last respond from Playground. ### Reproduction 1. Create the flow. 2. Run test in playground.…

sangdth updated 3 months ago
5
mlc-ai/mlc-llm #1999

[Question] how to serve 72B Qwen1.5 into 4x3090 gpu?

It seems to me that for now mlc is trying to loading all weight into one gpu card? After convert_weight/gen_config/compile, it report error when ready to serve: ``` AssertionError: Cannot estimat…

leiwen83 updated 3 months ago
14
vllm-project/vllm #8978

[Usage]: Serving Llama 3.2 `llama-3-2-11b-vision-instruct` h…

### Your current environment ```text The output of `python collect_env.py` ``` ``` :128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', bu…

rchen19 updated 1 month ago
8

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for llm-serving

1000+ results
for llm-serving