Closed m1nt07 closed 6 months ago
Hi, i tested the training on a A6000 GPU with CUDA 12.1 and torch 2.1, and it takes 23.61GB memory. I guess changing CUDA version is worth a try to reduce the memory usage.
Thank you, I'll try it.
Unfortunately, it didn't work when only changing CUDA version. I tried to use checkpoint method in pytorch to reduce vram, some codes were changed as below:
self.enable_gradient_checkpointing()
was added in src/model/unet_2d_multicondition.pyLine 421 was modified, src/model/unet_2d_blocks.py
if self.training and self.gradient_checkpointing:
# def create_custom_forward(module, return_dict=None):
# def custom_forward(*inputs):
# if return_dict is not None:
# return module(*inputs, return_dict=return_dict)
# else:
# return module(*inputs)
def create_custom_forward(module):
def custom_forward(*inputs):
return module(*inputs)
return custom_forward
hidden_states = torch.utils.checkpoint.checkpoint(
create_custom_forward(resnet), hidden_states, temb)
cond_hidden_states = {
k: torch.utils.checkpoint.checkpoint(
create_custom_forward(
# attns[k], return_dict=False), hidden_states,
attns[k]), hidden_states,
encoder_hidden_states[k]
)[0] for k in attns.keys() if encoder_hidden_states[k] is not None
}
hidden_states = torch.mean(torch.stack(
list(cond_hidden_states.values())), dim=0)
Finally, the repo can be trained on RTX 3090.
However, the reproduced results do not look satisfactory. Referring to the issue checkpoints/new_tunedvae, I tried three methods:
vae:
normalizer: 0.21966682713
pretrained_model_path: checkpoints/new_tunedvae
optimize_vae: true
vae:
normalizer: 0.21966682713
pretrained_model_path: checkpoints/new_tunedvae
But none of these results look good. One generated sample is shown below (seed=12897398647)
Any advice on this problem? Thanks a lot.
I'm having the same problem as you, I'm also training on RTX3090, what exactly did you modify that part of the code?
I'm having the same problem as you, I'm also training on RTX3090, what exactly did you modify that part of the code?
Just like what was mentioned in the previous answer, two changes were made:
like this? Thank you very much, I followed the same method as you and can already train. It's just that I only have one RTX3090, and I encountered this situation at the beginning of training, is it normal?
I'm having the same problem as you, I'm also training on RTX3090, what exactly did you modify that part of the code?
Just like what was mentioned in the previous answer, two changes were made:
- At the end of the init code, at the position in src/model/unet_2d_multicondition.py, the code self.enable_gradient_checkpointing() was added.
- In src/model/unet_2d_blocks.py, using the create_custom_forward function from the original code caused an error, so it was modified to the code mentioned above. With these adjustments, training could proceed.
https://github.com/UCSB-NLP-Chang/DiffSTE/issues/17#issuecomment-1976206836
I'm having the same problem as you, I'm also training on RTX3090, what exactly did you modify that part of the code?
Just like what was mentioned in the previous answer, two changes were made:
- At the end of the init code, at the position in src/model/unet_2d_multicondition.py, the code self.enable_gradient_checkpointing() was added.
- In src/model/unet_2d_blocks.py, using the create_custom_forward function from the original code caused an error, so it was modified to the code mentioned above. With these adjustments, training could proceed.
I put runwayml/stable-diffusion-inpainting locally:
Have you guys encountered this issue: AttributeError: 'UNet2DConditionModel' object has no attribute 'encoder'
@Question406 @kd-scki3011 After training on 2 A100 cards (80G memory) without using the checkpoint method, the results were as expected. So, there might be some bugs with the checkpoint method.
Thanks for your great work! I want to follow your work to reproduce the result in paper, my hardware environment is 7 RTX 3090(24G Memory)(1 card is being used, so I use 7 cards), but I met the error "CUDA out of memory" even though I set batch size=1 in configs/config_charinpaint.yaml
This is my train command:
This is my configs/config_charinpaint.yaml:
This is the training log:
I found that error occurs when iters achieve 7, it‘s related to "Gradient Accumulation steps = 8", maybe backward consumes too much memory? Have you ever met this problem? Or what should I do to solve this problem? Thanks a lot.