-
### 🐛 Describe the bug
I have the following code in file named `debug.py`
```python
import os
import torch
import argparse
class CustomModel(torch.nn.Module):
def __init__(self):
…
-
I am trying to two AEP memories on a H3C R4700 G3 platform, but it can't be used. All reports of pmemchk is attathed.
ndctl version: 71.1
ipmctl version: 03.00.00.0468
# ipmctl show -dimm
Di…
-
### Your current environment
```text
GPU 0: NVIDIA H100 80GB HBM3
GPU 1: NVIDIA H100 80GB HBM3
GPU 2: NVIDIA H100 80GB HBM3
GPU 3: NVIDIA H100 80GB HBM3
GPU 4: NVIDIA H100 80GB HBM3
GPU 5: NV…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.4.0
Is debug build: False
CUDA used to build PyTorch: 12…
-
### 🐛 Describe the bug
Training throughput slow down when loaded optimizer shard state dict.
1) train model and save model dict/optimizer dict with sharded state, the training throughput like:
![…
-
*re-bob:*
Supply the following if possible:
- Device problem occurs on
TL WDR3600 v1
- Software versions of OpenWrt/LEDE release, packages, etc.
OpenWrt 19.07.4 r11208-ce6496d796 / LuCI openwr…
-
### Your current environment
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.2 LTS (x86_64)
GCC version: (U…
-
### 🐛 Describe the bug
1. when I use the transformers=4.28.1 4 gpus to train and use 'model.hald()' export the fp16 model and serving
for the same input, the service get different output, the rat…
-
### 🐛 Describe the bug
Collecting environment information...
PyTorch version: 2.0.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.1…
-
**System information**
Steam client version: 1643850988 / built Feb 3 2022 00:45:46
Distribution: Kubuntu 21.04 Hirsute (x86_64)
CPU: AMD Ryzen 7 5800x
NVIDIA GeForce GTX 1650 (Super, proprietary…