gpu-optimization Search Results

1000+ results
for gpu-optimization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ROCm/TransformerEngine #79

[FSDP 8xMI300X] Llama3 8B FP8 is 21% slower than BF16 & OOMs…

### Problem Description Llama3 8B FP8 OOMs at the same batch size as BF16. I need to decrease the batch size to `2` for it to not OOM. At batch size 2, TE FP8 is **21% slower** than torch compile B…

OrenLeung updated 1 day ago
6
ros-navigation/navigation2 #3351

Potential MPPI Controller Improvements

General improvements - [ ] Add in some hand-crafted trajectories into the mix via https://arxiv.org/pdf/2401.09241 (i.e. drive to goal, stop, slow, fast, even other algorithms perhaps, etc with sampl…

SteveMacenski updated 1 month ago
32
rapidsai/cuml #5274

[QST] RAPIDS + Optuna on multi-GPU

Hello, I would like to ask you for the tip to setup and launch the Optuna execution on RAPIDS environment for hyper-parameter optimization using multi-GPU. Currently, I am using [OptunaSearchCV]…

ptynecki updated 1 year ago
2
microsoft/onnxruntime #13380

Using GPU in c++

Hi, Is there an example to use GPU in C++? Is it enough to add below lines to the code to use GPU? ``` OrtCUDAProviderOptions cudaOptions; cudaOptions.device_id = 0; sessionOptions.AppendExec…

EmreOzkose updated 8 months ago
13
AidenH-dev/AutoWM-Region-Labeler #1

Poor runtime optimization

The Matplotlib graph often freezes when manipulated to change zoom, axis tilt, or rotation. This was first noticed when generating 3d models for the tract regions. This effected the utility of the pro…

AidenH-dev updated 1 month ago
3
themurtazanazir/vec2text #6

Scaling to Llama-size models

(First priority) for comparison: Llama-2 7B, T5-base (Second priority) For being current, Llama-3.2 70B, T5-??? (try w/ HF first, then vLLM)

mattf1n updated 1 week ago
6
microsoft/LightGBM #6697

Why CUDA Performance Falls Short of CPU in LightGBM: Trainin…

## Description During the process of conducting source code reading and testing on LightGBM using a binary classifier, it was observed that the GPU performance during training is notably lower than th…

unbelievable3513 updated 1 week ago
4
exadel-inc/CompreFace #1082

Compreface-core SubCenter-ArcFace-r100-gpu not working (Nvid…

**Describe the bug** When running SubCenter-ArcFace-r100-gpu the container starts but does not function reporting the following error (see logs for full errors) I have not seen any worker / job when…

B08Z updated 1 year ago
3
vllm-project/vllm #8147

[Performance]: The impact of CPU on vLLM performance is sign…

### Proposal to improve performance We used the same GPU on two machines but different CPUs. The following experimental conclusions were drawn: Experimental results: The GPU is 3090, and the CPU w…

skylee-01 updated 2 months ago
11
triton-inference-server/server #7580

I don't know what to do.

**Description** A clear and concise description of what the bug is. ![output_image](https://github.com/user-attachments/assets/bed4e808-a3e0-4225-96c4-04ae69c65a15) **Triton Information** …

choi119 updated 2 months ago
6

上一页 1...11 12 13 14 15 16 17...100 下一页

1000+ results for gpu-optimization

1000+ results
for gpu-optimization