cuda-runtime-api Search Results

1000+ results
for cuda-runtime-api

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #7878

[Bug]: Requests larger than 75k input tokens cause `Input p…

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTor…

servient-ashwin updated 3 weeks ago
6
vllm-project/vllm #6905

[Bug]: JSON-guided generation failing to close text values

### Your current environment ``` PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.2 LTS (x86_64) GCC versio…

vecorro updated 1 month ago
2
microsoft/onnxruntime #21419

How to do multithreaded infer with onnxruntime

### Describe the issue When I do multithreaded infer via onnxruntime(python), I get an error. My onnx_session are all independent, model files are all read independently, for multithreaded reasoning …

XiaBing992 updated 2 months ago
2
vllm-project/vllm #6610

[Performance]: Multi-node Pipeline Parallel double bandwidth…

### Misc discussion on performance I've been running some simple tests on multi-node parallel pipeline with NCCL. I doubled the bandwidth between the nodes but saw no increase in t/s or throughput.…

drikster80 updated 3 weeks ago
5
Khrylx/DSGPURayTracing #1

GPU rendering not working

Hi, we've tried to compile the code, which worked, but when we run the code with GPU enabled we get an error. This is the compilation error log: ``` [ 6%] Building CXX object CMU462/src/CMakeFil…

straaljager updated 5 years ago
1
huggingface/optimum-nvidia #128

No engine file found for LLama 3 and Cuda API error with LLa…

I am trying to accelerate the inference speed of LLama 3 8b on a 4090 using quantization. I noticed this https://github.com/huggingface/optimum-nvidia which should allow to use fp8 and have huge speed…

PhilSapiens updated 4 months ago
1
eyalroz/cuda-api-wrappers #177

Consider adding wrappers for the CUPTI library

CUDA offers a library named CUPTI - the [CUDA Profiling Tools Interface](https://developer.nvidia.com/CUPTI-CTK10_2): > CUPTI provides a set of APIs targeted at ISVs creating profilers and other pe…

eyalroz updated 4 years ago
2
pytorch/pytorch #60401

Need workaround to support multiprocess CUDA tensor sharing …

## 🚀 Feature A different method for sharing CUDA Tensors across processes on Jetson platforms is needed. CUDA unified addressing based IPC functionality isn't yet supported on Tegra platforms…

shmsong updated 1 year ago
1
vllm-project/vllm #8978

[Usage]: Serving Llama 3.2 `llama-3-2-11b-vision-instruct` h…

### Your current environment ```text The output of `python collect_env.py` ``` ``` :128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', bu…

rchen19 updated 1 month ago
8
microsoft/onnxruntime #21966

[CUDA][Performance] Inference time greatly variates during s…

### Describe the issue During session run with CUDAExecutionProvider i noticed that the inference time has great variations based on whether or not i use other application on my computer. For exam…

roxanacincan updated 1 month ago
1

上一页 1...47 48 49 50 51 52 53...100 下一页

1000+ results for cuda-runtime-api

1000+ results
for cuda-runtime-api