frank-xwang / InstanceDiffusion

[CVPR 2024] Code release for "InstanceDiffusion: Instance-level Control for Image Generation"
https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/
Apache License 2.0
515 stars 29 forks source link

Cuda out of memory #24

Open Vicvickyue opened 7 months ago

Vicvickyue commented 7 months ago

Hello! Thank you so much for your amazing work. I'm posting to ask about the cuda out of memory error that I encounter when I'm running the InstanceDiffusion inference demon. I'm using one RTX3050 to run the program and there's no other process using the gpu while I'm running. WeChat Image_20240426173954

frank-xwang commented 7 months ago

Hi, you may want to use a smaller '--num_images'. Also, please confirm that the flash attention (we use it by default) is used to reduce the memory usage.

raindrop313 commented 5 months ago

I encountered the same issue, and reducing the "--num_images" did not resolve the problem. Based on the error message indicating an "out of memory" error during the model weight loading phase, could you please provide an estimate of how much GPU memory is required to run this project? 1 2 @frank-xwang

milky245 commented 5 months ago

Hello, I have met the same problem. I tried reduce the --num_image to 2 or 1, and have confirmed that flash_attn is able to run normally. I ran the demo on RTX4060 with 8GB memory, and I would like to know what GPU memory is needed for training and deployment. @frank-xwang Thanks and looking forward reply.

屏幕截图 2024-06-08 201751
frank-xwang commented 5 months ago

Apologies for the delayed response.

Thank you for your interest in InstanceDiffusion. I have made further optimizations to reduce the memory usage of the code. Please update to the latest version by pulling the new InstanceDiffusion code. To run this updated code, you will likely need a GPU with at least 13G of memory. I recently tested it locally on RTX 6000 GPUs, which have 24G of memory, and the inference consumed about 12.8G of memory. For training the model, we utilize A100 GPUs with 80G of memory.

The command I used for model inference:

CUDA_VISIBLE_DEVICES=6 python inference.py \
  --num_images 8 \
  --output OUTPUT/demo/ \
  --input_json demos/demo_cat_dog_robin.json \
  --ckpt pretrained/instancediffusion_sd15.pth \
  --test_config configs/test_box.yaml \
  --guidance_scale 7.5 \
  --alpha 0.75 \
  --seed 4 \
  --mis 0.3 \
  --cascade_strength 0.3

And the memory usage is attached as below:

image

Hope it helps!