Open Eurus-Holmes opened 1 month ago
The data for sft was uesd on 16 H100s and the number is GPU memory cost in each GPU, not on 1. Are you fine-tuning 5B?
The data for sft was uesd on 16 H100s and the number is GPU memory cost in each GPU, not on 1. Are you fine-tuning 5B?
@zRzRzRzRzRzRzR yes fine-tuning 5B SFT not Lora, try to set only_log_video_latents: False
to log_video but got OOM
System Info / 系統信息
H100 (80GB)
Information / 问题信息
Reproduction / 复现过程
I can run sat sft examples normally, but when I try to log video on wandb to check more training process, I got OOM: only changed
sft.yaml
:only_log_video_latents: False
and added wandb got OOM errorExpected behavior / 期待表现
I already set batch_size=1, not sure what else need to change to use
log_video