在训练保存之后进度卡住

Qrange-group / SUR-adapter

ACM MM'23 (oral), SUR-adapter for pre-trained diffusion models can acquire the powerful semantic understanding and reasoning capabilities from large language models to build a high-quality textual semantic representation for text-to-image generation.

MIT License

117 stars 2 forks source link

在训练保存之后进度卡住 #2

Closed KyleYueye closed 1 year ago

KyleYueye commented 1 year ago

你好，我在训练第一轮steps满足的时候，保存完之后会卡住，请问有遇到过这个问题吗？

if global_step % args.checkpointing_steps == 0:
    if accelerator.is_main_process:
        accelerator.wait_for_everyone()
        accelerator.save(
            accelerator.unwrap_model(noise_model).state_dict(),
            args.output_dir + f"/dae_checkpoint{global_step}.pt",
        )

zhongshsh commented 1 year ago

没有欸，我刚刚跑了一下，可以正常运行。可以看一下电脑是否还有足够的存储空间