-
hi, thanks for this excellent work. I have noticed that in the code of train-stage1.py line 106, the optimizer is AdamW
opt = torch.optim.AdamW(
swinir.parameters(), lr=cfg.train.learning_ra…
-
## Issue
Encountered a deadlock while running a JAX-based LLM training script on a TPU-v4-32 pod. SSH'd into worker 0 and ran the script there directly, instead of using `--worker all --command "..."…
-
batch size=32
lr=1.5e-4
weight_decay=0.05
adamW beta1=0.9, beta2=0.95
epochs=400
warm epoch=40
-
**Describe the bug**
My model use deepspeed `PipelineModule(num_stages=4)` split into 4 parts, and my `deepspeed.moe.layer.MoE` is only set in the pipeline stage1 layer. When my model `train_batch`, t…
-
### 🚀 The feature, motivation and pitch
Fused AdamW can accept tensor LR, and convert it to lr_dict internally, but sometimes not all lr lies in the same device, why not accept `dict[device, Tensor]`…
-
File "C:\Users\Dell\.conda\envs\xt\lib\site-packages\sentence_transformers\SentenceTransformer.py", line 33, in
class SentenceTransformer(nn.Sequential):
File "C:\Users\Dell\.conda\envs\xt\l…
ljvin updated
2 years ago
-
ValueError: paged_adamw_32bit is not a valid OptimizerNames, please select one of ['adamw_hf',
'adamw_torch', 'adamw_torch_fused', 'adamw_torch_xla', 'adamw_apex_fused', 'adafactor', 'adamw_bnb_8bit…
-
请问使用自带的三类别dataset_demo训练后可视化结果为什么都是噪声图:
![image](https://github.com/user-attachments/assets/ca0db7d5-9901-4e7a-abce-e3f2a9022752)
还有请问生成的ema和ckpt两种训练结果有什么不同
配置文件如下:
Namespace(seed=0, conditional=T…
-
One of the main benefits of LION, is it needs to save less data for each param.
Adam needs to save Momentum and RMSProp ema's, while in LION we need to save only momentum ema.
When I try to use LI…
-
Hi, many thanks for your great work.
I am trying to use the default script for training. I find that even if I use batch_size=1, training runs out of memory. I am wondering what might cause the pro…