-
**Describe the bug**
For ZeRO-3, i'm noticing an increase in training times on g5.48xlarge nodes with torch >= 2.3.1 and CUDA 12.1. I can reproduce this with small and large models, and in some cases…
-
### Your current environment
```
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC versio…
-
Hello,
Thanks for your help on my previous query.
I'm currently working with the replogle_rpe1_essential dataset, and I have added a few unseen genes just to assess how well the model performs …
-
### OpenVINO Version
tag 2024.1.0
### Operating System
Debian Bookworm (with latest `intel-opencl-icd`: `24.22.29735.21-1`)
### Device used for inference
iGPU
### OpenVINO installati…
-
@simone-silvestri can I convince you to rewrite this section with updated benchmarks, and include results for distributed systems?
https://github.com/CliMA/Oceananigans.jl?tab=readme-ov-file#perfor…
-
**Describe the bug**
When compiling to NPU, runtime error raised.
```
--> compiled_model = ov.compile_model(converted_model, device_name='NPU')
RuntimeError: Exception from src/inference/src/c…
-
### 🐛 Describe the bug
Dtensor shard uses more gpu memory than raw tensor.
With test, Shard gpu mem: 21890MiB > Replicate gpu mem: 17448MiB > Raw tensor gpu mem: 16804MiB.
Confused for a long time…
-
### System Info
```shell
python: 3.11
OS: Linux
torch: 2.4.0
optimum: 1.21.3
onnx: 1.16.2
onnxruntime: 1.18.1
onnxruntime-gpu: 1.19.0
```
### Who can help?
@JingyaHuang @echarlaix
### Info…
-
### System Info
tensorrt 10.2.0
tensorrt_llm 0.12.0.dev2024072301
A100-80G * 4
### Who can help?
@Tracin
### Information
- [X] The official example scripts
- [ ] My…
-
### 🐛 Describe the bug
When compiled autograd is enabled and model contains operations preferring inputs in channels last format, gradient and parameters memory layouts are inconsistent:
- parameter…
marok updated
2 weeks ago