adamw Search Results - Githubissues

1000+ results
for adamw

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

XPixelGroup/DiffBIR #115

optimizer selection

hi, thanks for this excellent work. I have noticed that in the code of train-stage1.py line 106, the optimizer is AdamW opt = torch.optim.AdamW( swinir.parameters(), lr=cfg.train.learning_ra…

xuanxu92 updated 5 months ago
6
jax-ml/jax #22436

CTRL+C Broken When Running distributed.initialize() on one T…

## Issue Encountered a deadlock while running a JAX-based LLM training script on a TPU-v4-32 pod. SSH'd into worker 0 and ran the script there directly, instead of using `--worker all --command "..."…

s-smits updated 3 months ago
4
facebookresearch/mae #95

train MAE on my won dataset (4.7w images), loss decreased fr…

batch size=32 lr=1.5e-4 weight_decay=0.05 adamW beta1=0.9, beta2=0.95 epochs=400 warm epoch=40

Mogul5306 updated 2 months ago
14
microsoft/DeepSpeed #6714

[BUG] pipeline parallelism+fp16+moe isn't working

**Describe the bug** My model use deepspeed `PipelineModule(num_stages=4)` split into 4 parts, and my `deepspeed.moe.layer.MoE` is only set in the pipeline stage1 layer. When my model `train_batch`, t…

NeferpitouS3 updated 8 hours ago
2
pytorch/pytorch #127284

Fused AdamW maybe should accept lr_dict directly?

### 🚀 The feature, motivation and pitch Fused AdamW can accept tensor LR, and convert it to lr_dict internally, but sometimes not all lr lies in the same device, why not accept `dict[device, Tensor]`…

Wongboo updated 5 months ago
1
UKPLab/sentence-transformers #1574

AttributeError: module 'transformers' has no attribute 'Adam…

File "C:\Users\Dell\.conda\envs\xt\lib\site-packages\sentence_transformers\SentenceTransformer.py", line 33, in class SentenceTransformer(nn.Sequential): File "C:\Users\Dell\.conda\envs\xt\l…

ljvin updated 2 years ago
2
yangjianxin1/Firefly #69

请教，运行报错

ValueError: paged_adamw_32bit is not a valid OptimizerNames, please select one of ['adamw_hf', 'adamw_torch', 'adamw_torch_fused', 'adamw_torch_xla', 'adamw_apex_fused', 'adafactor', 'adamw_bnb_8bit…

zlszhonglongshen updated 1 year ago
1
chairc/Integrated-Design-Diffusion-Model #94

训练DDPM生成vis为噪声图

请问使用自带的三类别dataset_demo训练后可视化结果为什么都是噪声图： ![image](https://github.com/user-attachments/assets/ca0db7d5-9901-4e7a-abce-e3f2a9022752) 还有请问生成的ema和ckpt两种训练结果有什么不同配置文件如下： Namespace(seed=0, conditional=T…

bye111 updated 1 week ago
2
lucidrains/lion-pytorch #22

Same amount of VRAM is taken as in AdamW

One of the main benefits of LION, is it needs to save less data for each param. Adam needs to save Momentum and RMSProp ema's, while in LION we need to save only momentum ema. When I try to use LI…

VCasecnikovs updated 5 months ago
6
Alpha-VLLM/Lumina-mGPT #28

Out of memory using the default training configuration

Hi, many thanks for your great work. I am trying to use the default script for training. I find that even if I use batch_size=1, training runs out of memory. I am wondering what might cause the pro…

JacobYuan7 updated 3 weeks ago
6

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for adamw

1000+ results
for adamw