ShivamShrirao / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
https://huggingface.co/docs/diffusers
Apache License 2.0
1.88k stars 506 forks source link

train_inpainting_dreambooth.py problem: AttributeError: Can't pickle local object 'main.<locals>.collate_fn' #135

Open Elevory opened 1 year ago

Elevory commented 1 year ago

Describe the bug

Hello,

I am receiving an error when attempting to train with a local copy of the stable-diffusion-inpainting model. The normal script, train_dreambooth.py, works fine.

Reproduction

Attempt to run "accelerate launch train_inpainting_dreambooth.py" with the arguments shown in the log below.

Logs

(diffusers) T:\shivam\examples\dreambooth>accelerate launch --num_cpu_threads_per_process 10 train_inpainting_dreambooth.py   --pretrained_model_name_or_path=".\\models\\stable-diffusion-inpainting"   --pretrained_vae_name_or_path="./sd-vae-ft-mse"   --output_dir="./out/model_inpainting"   --with_prior_preservation --prior_loss_weight=1.0   --seed=3434554   --resolution=512   --train_batch_size=2   --train_text_encoder   --mixed_precision="fp16"   --gradient_accumulation_steps=1   --learning_rate=2e-6   --lr_scheduler="constant"   --lr_warmup_steps=0   --num_class_images=50   --sample_batch_size=1   --max_train_steps=15000   --save_interval=500   --save_min_steps=999   --save_infer_steps=35   --concepts_list="concepts_list.json"   --not_cache_latents   --hflip
[!] Not using xformers memory efficient attention.
Steps:   0%|                                                                                | 0/15000 [00:00<?, ?it/s]<_io.BufferedWriter name=3>
<_io.BufferedWriter name=3>
[!] Not using xformers memory efficient attention.
Traceback (most recent call last):
  File "train_inpainting_dreambooth.py", line 869, in <module>
    main(args)
  File "train_inpainting_dreambooth.py", line 777, in main
    for step, batch in enumerate(train_dataloader):
  File "T:\programs\anaconda3\envs\diffusers\lib\site-packages\accelerate\data_loader.py", line 345, in __iter__
    dataloader_iter = super().__iter__()
  File "T:\programs\anaconda3\envs\diffusers\lib\site-packages\torch\utils\data\dataloader.py", line 444, in __iter__
    return self._get_iterator()
  File "T:\programs\anaconda3\envs\diffusers\lib\site-packages\torch\utils\data\dataloader.py", line 390, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "T:\programs\anaconda3\envs\diffusers\lib\site-packages\torch\utils\data\dataloader.py", line 1077, in __init__
    w.start()
  File "T:\programs\anaconda3\envs\diffusers\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "T:\programs\anaconda3\envs\diffusers\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "T:\programs\anaconda3\envs\diffusers\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "T:\programs\anaconda3\envs\diffusers\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "T:\programs\anaconda3\envs\diffusers\lib\multiprocessing\reduction.py", line 61, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'main.<locals>.collate_fn'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "T:\programs\anaconda3\envs\diffusers\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "T:\programs\anaconda3\envs\diffusers\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Steps:   0%|                                                                                | 0/15000 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "T:\programs\anaconda3\envs\diffusers\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "T:\programs\anaconda3\envs\diffusers\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "T:\programs\anaconda3\envs\diffusers\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "T:\programs\anaconda3\envs\diffusers\lib\site-packages\accelerate\commands\accelerate_cli.py", line 43, in main
    args.func(args)
  File "T:\programs\anaconda3\envs\diffusers\lib\site-packages\accelerate\commands\launch.py", line 837, in launch_command
    simple_launcher(args)
  File "T:\programs\anaconda3\envs\diffusers\lib\site-packages\accelerate\commands\launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['T:\\programs\\anaconda3\\envs\\diffusers\\python.exe', 'train_inpainting_dreambooth.py', '--pretrained_model_name_or_path=.\\\\models\\\\stable-diffusion-inpainting', '--pretrained_vae_name_or_path=./sd-vae-ft-mse', '--output_dir=./out/model_inpainting', '--with_prior_preservation', '--prior_loss_weight=1.0', '--seed=3434554', '--resolution=512', '--train_batch_size=2', '--train_text_encoder', '--mixed_precision=fp16', '--gradient_accumulation_steps=1', '--learning_rate=2e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=50', '--sample_batch_size=1', '--max_train_steps=15000', '--save_interval=500', '--save_min_steps=999', '--save_infer_steps=35', '--concepts_list=concepts_list.json', '--not_cache_latents', '--hflip']' returned non-zero exit status 1.

(diffusers) T:\shivam\examples\dreambooth>pause

System Info

Elevory commented 1 year ago

Hi,

I ran a few more tests this morning, unfortunately to no avail:

All attempts resulted in the same error. At this point I have to assume that the script is either unfinished, incompatible with Windows, or something went mysteriously wrong with my conda environment.

Can anyone confirm if they got this thing working on Windows?

Elevory commented 1 year ago

I may have fixed it. On line 634 of train_inpainting_dreambooth.py, remove num_workers=8 from the list of arguments. Script isn't crashing anymore, though it remains to be seen whether it produces desirable results.

geocine commented 1 year ago

I experienced the same issue, your solution worked

I may have fixed it. On line 634 of train_inpainting_dreambooth.py, remove num_workers=8 from the list of arguments. Script isn't crashing anymore, though it remains to be seen whether it produces desirable results.