adamw Search Results - Githubissues

1000+ results
for adamw

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

KMnP/vpt #30

AdamW did not search for weight decay parameter

https://github.com/KMnP/vpt/blob/94e5be7bddf7a398729c127928a50384b42e95f5/src/solver/optimizer.py#L47 It seems that AdamW optimizer did not search weight decay parameter.

kai-wen-yang updated 3 months ago
1
konstmish/prodigy #11

Possible to marry Prodigy and AdamW?

Been using Prodigy for a few days and honestly I'm very impressed by its performance. Especially, I can set a large learning rate (lr=1, d_coef=10) without blowing up the gradients. However, the final…

askerlee updated 7 months ago
8
facebookresearch/schedule_free #46

[FeatureRequest]Add AdEMAMixScheduleFree

![image](https://github.com/user-attachments/assets/c4235d6d-4d97-4335-841a-8d7256f44f00) code:https://github.com/nanowell/AdEMAMix-Optimizer-Pytorch 8bit version from bnb:https://github.com/bit…

sdbds updated 1 month ago
2
pytorch/pytorch #134191

AdamW is CPU-bottlenecked with FSDP2, with slow foreach kern…

### 🐛 Describe the bug This is an image model, with many small weights. Notice the large white gaps in the GPU section, corresponding to the CPU launchers taking a very long time. ![Screenshot 202…

ad8e updated 2 months ago
9
ultralytics/ultralytics #17068

Difference in learning speed between yolov8n and yolov11n

### Search before asking - [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…

kunsungwoo updated 1 week ago
5
huggingface/transformers #25695

tensor size mismatch with larger gradient_accumulation_steps…

### System Info A100 Nvidia 80G GPU ### Who can help? _No response_ ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An officially supported task…

yyymeta updated 2 weeks ago
6
huggingface/transformers #33413

Exception raised with trainer + `accelerate launch` FSDP + l…

### System Info - `transformers` version: 4.44.2 - Platform: Linux-5.15.0-119-generic-x86_64-with-glibc2.35 - Python version: 3.10.14 - Huggingface_hub version: 0.23.4 - Safetensors version: 0.4.…

tomtseng updated 22 hours ago
4
RVC-Project/Retrieval-based-Voice-Conversion-WebUI #952

Change the Optimization Method (AdamW --> diffGrad)

Adamw was replaced by a new optimization method - **diffgrad**. What do you think about it? Will you add it to increase the accuracy in the experiment? **Change** ``` optim_g = torch.optim.Ad…

jhchase updated 11 months ago
1
pytorch/pytorch #135963

DISABLED test_grad_scaling_autocast_fused_optimizers_AdamW_c…

Platforms: linux This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_grad_scaling_autocast_fused_optimizers_AdamW_cuda_float32&suite=…

pytorch-bot[bot] updated 1 month ago
3
JusperLee/IIANet #1

where can I get a model named lrw_resnet18_mstcn_adamw_s3.pt…

hi where can I get a model named lrw_resnet18_mstcn_adamw_s3.pth.tar? thanks

redizzy updated 2 months ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for adamw

1000+ results
for adamw