-
## Description
I have an onnx model i would like to convert to a trt engine to run some perf testing and see the differences in performance. For context, this is a DINO model generated by the MMD…
-
-
**Describe the bug**
I was trying to run an inference with DeepSpeed on the Llama model, but when I ran `deepspeed --num_gpus 4 script.py`, the process terminated automatically after loading the ch…
-
Namespace(agnostic_nms=False, api_key=None, augment=False, cfg='/content/zero-shot-object-tracking/models/yolov5s.yaml', classes=None, confidence=0.4, detection_engine='yolov5', device='', exist_ok=Fa…
-
For my university FYP project related to text simplification, there's a requirement for me to generate LASER embeddings for a large number of sentences. (15.7 million) However when I try to generate L…
-
For the [model](https://github.com/mlperf/inference_results_v0.5/tree/master/open/NVIDIA#2-model-description) and[ fine tuning](https://github.com/mlperf/inference_results_v0.5/tree/master/open/NVIDIA…
-
### 🐛 Describe the bug
A simple repro
```
import torch
x = torch.randn(1, 4, 5, 5, 5, device="cuda") …
-
这是我的运行代码:
python -m vllm.entrypoints.openai.api_server --served-model-name Qwen2-VL-7B-Instruct --model /home/wangll/llm/model_download_demo/models/Qwen/Qwen2-VL-7B-Instruct
以下是报错信息:
INFO 09-03 1…
-
## Description
## Environment
**TensorRT Version**: 8.6.1
**NVIDIA GPU**: T4
**NVIDIA Driver Version**: 525
**CUDA Version**: 11.4 (nvidis-docker2)
**CUDNN Version**: cud…
-
Dear Cromwell dev team,
This is an enhancement suggestion.
When using the google backend for resources allocation, one can specify `gpuCount` and `gpuType` to request for specific resources. I …