-
https://github.com/KMnP/vpt/blob/94e5be7bddf7a398729c127928a50384b42e95f5/src/solver/optimizer.py#L47
It seems that AdamW optimizer did not search weight decay parameter.
-
Been using Prodigy for a few days and honestly I'm very impressed by its performance. Especially, I can set a large learning rate (lr=1, d_coef=10) without blowing up the gradients. However, the final…
-
![image](https://github.com/user-attachments/assets/c4235d6d-4d97-4335-841a-8d7256f44f00)
code:https://github.com/nanowell/AdEMAMix-Optimizer-Pytorch
8bit version from bnb:https://github.com/bit…
sdbds updated
1 month ago
-
### 🐛 Describe the bug
This is an image model, with many small weights. Notice the large white gaps in the GPU section, corresponding to the CPU launchers taking a very long time.
![Screenshot 202…
ad8e updated
2 months ago
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
### System Info
A100 Nvidia 80G GPU
### Who can help?
_No response_
### Information
- [ ] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] An officially supported task…
-
### System Info
- `transformers` version: 4.44.2
- Platform: Linux-5.15.0-119-generic-x86_64-with-glibc2.35
- Python version: 3.10.14
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.…
-
Adamw was replaced by a new optimization method - **diffgrad**. What do you think about it? Will you add it to increase the accuracy in the experiment?
**Change**
```
optim_g = torch.optim.Ad…
-
Platforms: linux
This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_grad_scaling_autocast_fused_optimizers_AdamW_cuda_float32&suite=…
-
hi
where can I get a model named lrw_resnet18_mstcn_adamw_s3.pth.tar?
thanks