inference-server Search Results

1000+ results
for inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

qwopqwop200/GPTQ-for-LLaMa #274

Issue with GPTQ

I have the following problem: `model=Honkware/openchat_8192-GPTQ ` `text-generation-launcher --model-id $model --num-shard 1 --quantize gptq --port 8080 ` ``` Traceback (most recent call las…

d0lphin updated 1 year ago
1
ollama/ollama #7003

Ollama freezes when specifying chat roles for some models.

### What is the issue? When testing llava-llama3 on an agentic task of interpreting an image and generating an action. I specified the role of the 'environment' as 'environment'. This leads to ollama…

lumost updated 1 day ago
3
Nerogar/OneTrainer #446

[Bug]: Setting "train" to false for TEs does not freeze TEs …

### What happened? Configuring TEs as follows: ``` "text_encoder": { "train": false, "learning_rate": 2e-8, "layer_skip": 0, "weight_dtype": "FLOAT_32", "stop_trainin…

orcinus updated 2 weeks ago
11
meta-llama/llama-stack #246

AttributeError: 'ChatCompletionResponse' object has no attri…

data_url = data_url_from_image("dog.jpg") print("The obtained data url is", data_url) iterator = client.inference.chat_completion( model=model, messages=[ { "role": "…

AI-Aether updated 4 weeks ago
2
bytedance/lightseq #414

Does LightSeq support ONNX export and Triton Inference Serve…

Hi team, QQ: does `lightseq` support the followings, - Convert HuggingFace BERT/RoBERTa models to `int8` precision directly - If yes, can the converted model be exported to ONNX format directly? - …

stevezheng23 updated 1 year ago
1
vllm-project/vllm #6292

4块4090部署推理性能问题

### Your current environment python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --max-model-len 8192 --served-model-name chat-v2.0 --model /workspace/chat-v2.0 --enforce-eager --tensor-paral…

lxb0425 updated 3 weeks ago
1
NVIDIA/TensorRT-LLM #2101

Finding protobuf files while benchmarking TensorRT-LLM

### System Info I am working on the benchmarking suite in vLLM team, and now trying to run TensorRT-LLM for comparison. I am relying on this github repo (https://github.com/neuralmagic/tensorrt-demo)…

KuntaiDu updated 5 days ago
4
triton-inference-server/server #7672

Histogram Metric for multi-instance tail latency aggregation

**Is your feature request related to a problem? Please describe.** This issue is similar to the one mentioned here: https://github.com/triton-inference-server/server/issues/7287. I'd like to file an …

AshwinAmbal updated 1 month ago
1
triton-inference-server/server #6485

Inquiry about streaming response functionality through the H…

Currently I'm using llm to generate streaming response, and I found that triton only supports streaming output through the grpc protocol. [https://docs.nvidia.com/deeplearning/triton-inference-server/…

lewisword updated 6 months ago
10
PaddlePaddle/PaddleX #2220

ubuntu24. cuda11.8 下训练，显存一直新增，知道溢出报错

## 描述问题 ## 复现 1. 您是否已经正常运行我们提供的[教程](https://github.com/PaddlePaddle/PaddleX/tree/develop/tutorials)？正常 2. 您是否在教程的基础上修改代码内容？还请您提供运行的代码无 3. 您使用的数据集是？自己标注的 4. 请提供您出现的报错信息及相关log [2024/10/1…

alanOO7 updated 1 week ago
5

上一页 1...34 35 36 37 38 39 40...100 下一页

1000+ results for inference-server

1000+ results
for inference-server