-
### 🚀 The feature, motivation and pitch
when use fsdp, it need load model on cpu, but every process load which means it need 8 times cpu memory on a 8 GPU machine, causing insufficient CPU memory, is…
-
## 🐛 Bug
Common optimizer like Adam/AdamW takes too long in `optimizer.step()` for small models. I tested a small ViT with 5.8M parameters and `torch.optim.AdamW` takes ~.2s for a single step.
#…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Affected Build(s)
5115f64
### Description of Issue
As the title says, it cashes on almost all games which is…
-
### Describe the issue
I do a qat quantization on a cnn model, when a export it to onnx model, and got a slower inference than torchscript qat model.
the result is
torchscript: 4.798517942428589 …
-
### Your current environment
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: CentOS Linux 7 (Core) (x86_64)
GCC version:…
-
### System Info
- `transformers` version: 4.45.0.dev0
- Platform: Linux-5.15.0-1027-gcp-x86_64-with-glibc2.31
- Python version: 3.9.19
- Huggingface_hub version: 0.24.5
- Safetensors version: 0…
-
Hello, I stumbled upon throttled trying to fix throttling issues on archlinux, but it looks like this processor is not yet supported
```
cpu family : 6
model : 186
model name …
-
**Describe the bug**
For ZeRO-3, i'm noticing an increase in training times on g5.48xlarge nodes with torch >= 2.3.1 and CUDA 12.1. I can reproduce this with small and large models, and in some cases…
-
### Your current environment
```
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC versio…
-
### System Info
```shell
python: 3.11
OS: Linux
torch: 2.4.0
optimum: 1.21.3
onnx: 1.16.2
onnxruntime: 1.18.1
onnxruntime-gpu: 1.19.0
```
### Who can help?
@JingyaHuang @echarlaix
### Info…