-
To obtain reproducible benchmark results for training/evaluation throughput, it is required that a reliable model configuration exists.
It would be great if ideas for configurations targeting the f…
-
### Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related iss…
-
### Tickets
- [ ] Simulate Load
- [ ] Visualize Performance
- [ ] Horizontal Scaling (Automatic/Manual)
- [ ] Vertical Scaling (Automatic/Manual)
-
When I run batch inferences, sometimes, the output from vLLM is empty, meaning prediction is empty. Could we make it at least it generate one token? The output is empty is also strange.
-
### System Info
- 4x A100 SXM 40GB
- CUDA 12.4
- Docker: nvidia/cuda:12.4.0-devel-ubuntu22.04
- TensorRT-LLM version: 0.10.0
### Who can help?
@kaiyux
### Information
- [x] The official exampl…
-
Hello fellow gemm optimizer enthusiast,
It would be extremely useful to provide benchmark utilities, ideally in GFlop/s TFlop/s to compare with other frameworks, compare with the CPU peak theoretic…
-
### Description
Druid uses a deprecated version of the AWS SDK Java, [1.x](https://github.com/aws/aws-sdk-java). AWS is requesting developers to move to the new version, [2.x](https://github.com/aw…
-
- What you expect to be happening.
Works as normal
- What is actually happening?
Every few minutes there are errors related to a bad_cast error that causes the virtual interfaces to be d…
-
I tried to call oneDPL USM and Thrust functions for segment reduction. The pointers (d_keys, d_in, d_keys_out, and d_out) point to device memory.
```
for (int i = 0; i < repeat; i++)
onea…
-
**Ashok Reddy** ([Bug 64846](https://bz.apache.org/bugzilla//show_bug.cgi?id=64846&redirect=false)):
sum of all sample request throughput is not matching with total throughput:
Example:-
sample 1 thr…