Open ProGamerGov opened 2 years ago
I got a bit further by doing:
huggingface-cli login
accelerate launch --save_sample_prompt "a photo of sks <concept>" --pretrained_model_name_or_path "v1-5-pruned-emaonly.ckpt" --instance_data_dir "training_images" --class_data_dir "<concept>" --output_dir "text-inversion-model" --with_prior_preservation --prior_loss_weight 1.0 --instance_prompt "photo of sks <concept>" --class_prompt "<concept>" --seed 1337 --resolution 512 --train_batch_size 1 --train_text_encoder --mixed_precision "no" --gradient_accumulation_steps 1 --learning_rate 1e-6 --lr_scheduler "constant" --lr_warmup_steps 0 --num_class_images 2000 --sample_batch_size 4 --max_train_steps 15000 --save_interval 500 --pretrained_vae_name_or_path "vae-ft-ema-560000-ema-pruned.ckpt"
But it still ends up doing nothing with no indication of what's wrong.
The script just hangs, with indication of any errors or progress:
user@instance-1:~$ accelerate launch --save_sample_prompt "a photo of sks <concept>" --pretrained_model_name_or_path "runwayml/stable-diffusion-v1-5" --pretrained_vae_name_or_path="stabilityai/sd-vae-ft-ema" --instance_data_dir "training_images" --class_data_dir <concept> --output_dir "text-inversion-model" --with_prior_preservation --prior_loss_weight 1.0 --instance_prompt "photo of sks <concept>" --class_prompt "<concept>" --seed 1337 --resolution 512 --train_batch_size 1 --train_text_encoder --mixed_precision "no" --gradient_accumulation_steps 1 --learning_rate 1e-6 --lr_scheduler "constant" --lr_warmup_steps 0 --num_class_images 2000 --sample_batch_size 4 --max_train_steps 15000 --save_interval 500
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_cpu_threads_per_process` was set to `6` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
[!] Not using xformers memory efficient attention.
/opt/conda/lib/python3.7/site-packages/accelerate/ UserWarning: `log_with=tensorboard` was passed but no supported trackers are currently installed.
warnings.warn(f"`log_with={log_with}` was passed but no supported trackers are currently installed.")
This is what I have install on the instance:
@ShivamShrirao Any ideas on why it doesn't work?
Can't say. Btw you should install xformers. To check where script is hanging, press ctrl+C. The traceback will show where it was stuck.
@ShivamShrirao I did some more testing and it looks like hangs on the following line for some reason:
if args.seed is not None:
When I omitted the seed parameter, everything worked.
Without using the seed parameter, it makes it up to this line before it stops working again:
I tried looking for similar issues:
But I'm not sure why its hanging on this line for this repo.
These are the parameters that I'm using:
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export VAE_NAME="stabilityai/sd-vae-ft-mse"
export INSTANCE_DIR="concept_images"
export CLASS_DIR="class_reg_images"
export OUTPUT_DIR="path-to-save-model"
accelerate launch \
--pretrained_model_name_or_path=$MODEL_NAME \
--pretrained_vae_name_or_path=$VAE_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks <concept>" \
--class_prompt="<concept>" \
--save_sample_prompt="photo of sks <concept>" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--learning_rate=5e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=1290 \
--save_interval=500 \
--max_train_steps=25000 \
--train_text_encoder \
--mixed_precision="no" \
This issue may be related?
Can't say. Btw you should install xformers. To check where script is hanging, press ctrl+C. The traceback will show where it was stuck.
I need to find a pre-compiled xformers binary for the A100 40GB card first.
I just tried using this version of xformers and got the same issue:
pip install -q
user@instance-1:~$ sh
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_cpu_threads_per_process` was set to `6` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
WARNING:root:A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 1.06M/1.06M [00:00<00:00, 2.89MB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 525k/525k [00:00<00:00, 2.18MB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 472/472 [00:00<00:00, 471kB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 806/806 [00:00<00:00, 848kB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 617/617 [00:00<00:00, 607kB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 492M/492M [00:06<00:00, 72.7MB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 335M/335M [00:04<00:00, 73.6MB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 547/547 [00:00<00:00, 522kB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 3.44G/3.44G [00:48<00:00, 70.2MB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 743/743 [00:00<00:00, 721kB/s]
Steps: 0%| | 0/25000 [00:00<?, ?it/s]
I was able to get it working!
I created a file called environment.yaml
and put this inside:
name: ldm
- pytorch
- defaults
- python=3.8.10
- pip=20.3
- cudatoolkit=11.3
- pip:
- git+
- accelerate==0.12.0
- torchvision
- transformers>=4.21.0
- ftfy
- tensorboard
- modelcards
Next I ran:
conda env create -f environment.yaml
Followed by:
conda activate ldm
After running the dreambooth script, it finnally gave be an error:
NVIDIA A100-SXM4-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA A100-SXM4-40GB GPU with PyTorch, please check the instructions at
So, ran the following code and now the dreambooth script seems to work!
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
I'm having trouble repeating my above success, even when using the exact same commands:
wget -q
conda env create -f conda.yaml
conda activate ldm
huggingface-cli login
pip install -q
accelerate config
pip install triton
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_cpu_threads_per_process` was set to `6` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. cannot open shared object file: No such file or directory
WARNING:root:WARNING: cannot open shared object file: No such file or directory
Need to compile C++ extensions to get sparse attention suport. Please run python build develop
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 1.06M/1.06M [00:00<00:00, 2.89MB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 525k/525k [00:00<00:00, 2.17MB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 472/472 [00:00<00:00, 494kB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 806/806 [00:00<00:00, 856kB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 617/617 [00:00<00:00, 619kB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 492M/492M [00:04<00:00, 100MB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 335M/335M [00:03<00:00, 100MB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 547/547 [00:00<00:00, 530kB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 3.44G/3.44G [00:35<00:00, 98.2MB/s]
Downloading: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 743/743 [00:00<00:00, 678kB/s]
Steps: 0%| | 0/25000 [00:00<?, ?it/s]Traceback (most recent call last):
File "", line 765, in <module>
File "", line 712, in main
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/diffusers/models/", line 296, in forward
sample, res_samples = downsample_block(
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/diffusers/models/", line 563, in forward
hidden_states = attn(hidden_states, context=encoder_hidden_states)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/diffusers/models/", line 169, in forward
hidden_states = block(hidden_states, context=context)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/diffusers/models/", line 218, in forward
hidden_states = self.attn1(self.norm1(hidden_states)) + hidden_states
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/diffusers/models/", line 291, in forward
hidden_states = xformers.ops.memory_efficient_attention(query, key, value)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/xformers/", line 617, in memory_efficient_attention
op = AttentionOpDispatch.from_arguments(
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/xformers/", line 580, in op
raise NotImplementedError(f"No operator found for this attention: {self}")
NotImplementedError: No operator found for this attention: AttentionOpDispatch(dtype=torch.float32, device=device(type='cpu'), k=40, has_dropout=False, attn_bias_type=<class 'NoneType'>, kv_len=4096, q_len=4096)
Steps: 0%| | 0/25000 [00:04<?, ?it/s]
Traceback (most recent call last):
File "/opt/conda/envs/ldm/bin/accelerate", line 8, in <module>
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/accelerate/commands/", line 43, in main
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/accelerate/commands/", line 837, in launch_command
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/accelerate/commands/", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/envs/ldm/bin/python', '',
Same error reported here:
Seems like its a PyTorch version issue:
The new error I have was reported here as well:
Looking at the log for when I succeeded, I see the following PyTorch / Cuda versions:
torchvision pytorch/linux-64::torchvision-0.13.1-py38_cu113 None
pytorch pytorch/linux-64::pytorch-1.12.1-py3.8_cuda11.3_cudnn8.3.2_0 None
So, maybe the versions are somehow getting messed up?
I think that it may have been the PyTorch version. I tried using the this environment.yaml
name: ldm
- pytorch
- defaults
- python=3.8.10
- pip=20.3
- cudatoolkit=11.3
- pytorch=1.12.1
- torchvision=0.13.1
- pip:
- git+
- triton
- accelerate==0.12.0
- torchvision
- transformers>=4.21.0
- ftfy
- tensorboard
- modelcards
And I used it as part of these commands:
wget -q
conda env create -f environment.yaml
conda activate ldm
pip install -q
huggingface-cli login
And it worked!
@ProGamerGov what base docker image did you use in that case?
Describe the bug
I tried running the code earlier, and nothing seemed to happen after I ran the script via cmd:
I literally start the instance, upload my images, download the models and run the following code:
I tried again without the accelerate stuff:
It'd be helpful if there was some sort of indication if stuff was happening behind the scenes.
System Info
Debian Instance on GCP with an A100 40GB graphics card.