StormArcher commented 6 months ago

Could the author share the code for calculating the model parameters(Param.) and the model computational complexity(MACs) of the pipeline. very thank you!

bokyeong1015 commented 6 months ago

Hi, we've added the code and please run:

pip install thop==0.1.1.post2209072238
python src/count_macs_params.py

Click for the results:

== CompVis/stable-diffusion-v1-4 | 512x512 img generation == [Text Enc] MACs: 6.5G = 6545882112 [Text Enc] Params: 123.1M = 123060480 [U-Net] MACs: 338.7G = 338749194240 [U-Net] Params: 859.5M = 859520964 [Img Dec] MACs: 1240.1G = 1240079532032 [Img Dec] Params: 49.5M = 49490179 [Total] MACs: 1585.4G = 1585374608384 [Total] Params: 1032.1M = 1032071623 == nota-ai/bk-sdm-base | 512x512 img generation == [Text Enc] MACs: 6.5G = 6545882112 [Text Enc] Params: 123.1M = 123060480 [U-Net] MACs: 223.8G = 223755632640 [U-Net] Params: 579.4M = 579384964 [Img Dec] MACs: 1240.1G = 1240079532032 [Img Dec] Params: 49.5M = 49490179 [Total] MACs: 1470.4G = 1470381046784 [Total] Params: 751.9M = 751935623 == nota-ai/bk-sdm-small | 512x512 img generation == [Text Enc] MACs: 6.5G = 6545882112 [Text Enc] Params: 123.1M = 123060480 [U-Net] MACs: 217.7G = 217727959040 [U-Net] Params: 482.3M = 482346884 [Img Dec] MACs: 1240.1G = 1240079532032 [Img Dec] Params: 49.5M = 49490179 [Total] MACs: 1464.4G = 1464353373184 [Total] Params: 654.9M = 654897543 == nota-ai/bk-sdm-tiny | 512x512 img generation == [Text Enc] MACs: 6.5G = 6545882112 [Text Enc] Params: 123.1M = 123060480 [U-Net] MACs: 205.0G = 205035274240 [U-Net] Params: 323.4M = 323384964 [Img Dec] MACs: 1240.1G = 1240079532032 [Img Dec] Params: 49.5M = 49490179 [Total] MACs: 1451.7G = 1451660688384 [Total] Params: 495.9M = 495935623 == runwayml/stable-diffusion-v1-5 | 512x512 img generation == [Text Enc] MACs: 6.5G = 6545882112 [Text Enc] Params: 123.1M = 123060480 [U-Net] MACs: 338.7G = 338749194240 [U-Net] Params: 859.5M = 859520964 [Img Dec] MACs: 1240.1G = 1240079532032 [Img Dec] Params: 49.5M = 49490179 [Total] MACs: 1585.4G = 1585374608384 [Total] Params: 1032.1M = 1032071623 == stabilityai/stable-diffusion-2-1-base | 512x512 img generation == [Text Enc] MACs: 22.3G = 22299160576 [Text Enc] Params: 340.4M = 340387840 [U-Net] MACs: 339.2G = 339241205760 [U-Net] Params: 865.9M = 865910724 [Img Dec] MACs: 1240.1G = 1240079532032 [Img Dec] Params: 49.5M = 49490179 [Total] MACs: 1601.6G = 1601619898368 [Total] Params: 1255.8M = 1255788743 == stabilityai/stable-diffusion-2-1 | 768x768 img generation == [Text Enc] MACs: 22.3G = 22299160576 [Text Enc] Params: 340.4M = 340387840 [U-Net] MACs: 760.8G = 760797839360 [U-Net] Params: 865.9M = 865910724 [Img Dec] MACs: 1240.1G = 1240079532032 [Img Dec] Params: 49.5M = 49490179 [Total] MACs: 2023.2G = 2023176531968 [Total] Params: 1255.8M = 1255788743

StormArcher commented 6 months ago

We followed the author code for testing, and the experimental results show that the results of MACs differ too much from the paper. Is there a problem writing code in this？

== CompVis/stable-diffusion-v1-4 | 512x512 img generation == [Text Enc] MACs: 6.5G = 6545882112 [Text Enc] Params: 123.1M = 123060480 [U-Net] MACs: 0.2G = 232980480 [U-Net] Params: 859.5M = 859520964 [Img Dec] MACs: 1.0G = 981467136 [Img Dec] Params: 49.5M = 49490179 [Total] MACs: 7.8G = 7760329728 [Total] Params: 1032.1M = 1032071623

bokyeong1015 commented 6 months ago

We obtained the results below with the refactored and uploaded code, which are identical to those presented in our paper.

Refer to the previous message for the full results.

== CompVis/stable-diffusion-v1-4 | 512x512 img generation == [Text Enc] MACs: 6.5G = 6545882112 [Text Enc] Params: 123.1M = 123060480 [U-Net] MACs: 338.7G = 338749194240 [U-Net] Params: 859.5M = 859520964 [Img Dec] MACs: 1240.1G = 1240079532032 [Img Dec] Params: 49.5M = 49490179 [Total] MACs: 1585.4G = 1585374608384 [Total] Params: 1032.1M = 1032071623

Could you share the output of pip show thop to check the version and ensure that pip install thop==0.1.1.post2209072238? If you could share the exact procedure or code that you've run, it would help us in reproducing your issue.

StormArcher commented 6 months ago

I ran your code directly, just for "CompVis/stable-diffusion-v1-4",as follows: get_macs_params(model_id="CompVis/stable-diffusion-v1-4", img_size=512, txt_emb_size=768, device=device)

2 . the version is same as you. import thop print(thop.version) 0.1.1

Why is running step 1? batch_size=1 dummy_timesteps = torch.zeros(batch_size).to(device)

How to control the sample step?

The code as follows:

@@ -0,0 +1,62 @@

------------------------------------------------------------------------------------

Copyright 2024. Nota Inc. All Rights Reserved.

------------------------------------------------------------------------------------

import torch from diffusers import StableDiffusionPipeline from thop import profile

def count_params(model): return sum(p.numel() for p in model.parameters())

def get_macs_params(model_id, img_size=512, txt_emb_size=768, device="cuda", batch_size=1): pipeline = StableDiffusionPipeline.from_pretrained(model_id).to(device) text_encoder = pipeline.text_encoder unet = pipeline.unet vae_decoder = pipeline.vae.decoder

# text encoder    
dummy_input_ids = torch.zeros(batch_size, 77).long().to(device)  # (1,77)
macs_txt_enc, _ = profile(text_encoder, inputs=(dummy_input_ids,))  # (1,77)
macs_txt_enc = macs_txt_enc/batch_size
params_txt_enc = count_params(text_encoder)

# unet
dummy_noisy_latents = torch.zeros(batch_size, 4, int(img_size/8), int(img_size/8)).to(device)  # (1, 4, 512/8, 512/8) = (1, 4, 64, 64)
dummy_timesteps = torch.zeros(batch_size).to(device)  # 1
dummy_text_emb = torch.zeros(batch_size, 77, txt_emb_size).to(device)  # (1, 77, 768)
# key (1, 4, 64, 64)(1)(1, 77, 768)
macs_unet, _ = profile(unet, inputs= (dummy_noisy_latents, dummy_timesteps, dummy_text_emb))  # (1, 4, 64, 64) (1) (1, 77, 768)
macs_unet = macs_unet/batch_size
params_unet = count_params(unet)

# image decoder
dummy_latents = torch.zeros(batch_size, 4, 64, 64).to(device)  # (1, 4, 64, 64)
macs_img_dec, _ = profile(vae_decoder, inputs= (dummy_latents,))
macs_img_dec = macs_img_dec/batch_size
params_img_dec = count_params(vae_decoder)

# total
macs_total = macs_txt_enc+macs_unet+macs_img_dec
params_total = params_txt_enc+params_unet+params_img_dec

# print
print(f"== {model_id} | {img_size}x{img_size} img generation ==")
print(f"  [Text Enc] MACs: {(macs_txt_enc/1e9):.1f}G = {int(macs_txt_enc)}")
print(f"  [Text Enc] Params: {(params_txt_enc/1e6):.1f}M = {int(params_txt_enc)}")
print(f"  [U-Net] MACs: {(macs_unet/1e9):.1f}G = {int(macs_unet)}")
print(f"  [U-Net] Params: {(params_unet/1e6):.1f}M = {int(params_unet)}")
print(f"  [Img Dec] MACs: {(macs_img_dec/1e9):.1f}G = {int(macs_img_dec)}")
print(f"  [Img Dec] Params: {(params_img_dec/1e6):.1f}M = {int(params_img_dec)}")    
print(f"  [Total] MACs: {(macs_total/1e9):.1f}G = {int(macs_total)}")
print(f"  [Total] Params: {(params_total/1e6):.1f}M = {int(params_total)}")

if name == "main":
device="cuda:0" get_macs_params(model_id="CompVis/stable-diffusion-v1-4", img_size=512, txt_emb_size=768, device=device)

get_macs_params(model_id="nota-ai/bk-sdm-base", img_size=512, txt_emb_size=768, device=device)

# get_macs_params(model_id="nota-ai/bk-sdm-small", img_size=512, txt_emb_size=768, device=device)
# get_macs_params(model_id="nota-ai/bk-sdm-tiny", img_size=512, txt_emb_size=768, device=device)    

# get_macs_params(model_id="runwayml/stable-diffusion-v1-5", img_size=512, txt_emb_size=768, device=device)
# get_macs_params(model_id="stabilityai/stable-diffusion-2-1-base", img_size=512, txt_emb_size=1024, device=device)
# get_macs_params(model_id="stabilityai/stable-diffusion-2-1", img_size=768, txt_emb_size=1024, device=device)

bokyeong1015 commented 6 months ago

1 - Thanks for checking. 3 - Do you mean how to control the number of denoising steps? We calculated MACs for a single denoising step and then multiplied it with the total number of steps, which is 25. dummy_timesteps is a single scalar value for the timestep index.

2 - Umm, when we did pip install thop==0.1.1.post2209072238, we obtained the below log with your code. Please share:

the output of pip show thop (0.1.1 seems to have multiple tags)
the output of pip show diffusers
the full log you've obtained

the log we obtained

`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["bos_token_id"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["eos_token_id"]` will be overriden.
[INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>.
[INFO] Register count_normalization() for <class 'torch.nn.modules.normalization.LayerNorm'>.
[INFO] Register count_convNd() for <class 'torch.nn.modules.conv.Conv2d'>.
[INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>.
[INFO] Register count_normalization() for <class 'torch.nn.modules.normalization.LayerNorm'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.dropout.Dropout'>.
[INFO] Register count_convNd() for <class 'torch.nn.modules.conv.Conv2d'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.dropout.Dropout'>.
[INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>.
== CompVis/stable-diffusion-v1-4 | 512x512 img generation ==
  [Text Enc] MACs: 6.5G = 6545882112
  [Text Enc] Params: 123.1M = 123060480
  [U-Net] MACs: 338.7G = 338749194240
  [U-Net] Params: 859.5M = 859520964
  [Img Dec] MACs: 1240.1G = 1240079532032
  [Img Dec] Params: 49.5M = 49490179
  [Total] MACs: 1585.4G = 1585374608384
  [Total] Params: 1032.1M = 1032071623

pip show thop 0.1.1.post2209072238

Name: thop
Version: 0.1.1.post2209072238
Summary: A tool to count the FLOPs of PyTorch model.
Home-page: https://github.com/Lyken17/pytorch-OpCounter/

pip show diffusers 0.15.0

Name: diffusers
Version: 0.15.0
Summary: Diffusers
Home-page: https://github.com/huggingface/diffusers

StormArcher commented 6 months ago

I think the author may have forgotten to add “/8” for img_size of input of UNet, so the computational complexity of unet is (339G), if "img_size/8" should be the same as mine (0.2G)

if img_size/8, is 0.2G if img_size/4, is 0.9G if img_size/2, is 3.7G

I think the version of thop and diffuers is same with you， but the MACs og my UNet for "runwayml/stable-diffusion-v1-5" is

-> the output of pip show thop Name: thop Version: 0.1.1.post2209072238 Summary: A tool to count the FLOPs of PyTorch model. Home-page: https://github.com/Lyken17/pytorch-OpCounter/ Author: Ligeng Zhu Author-email: ligeng.zhu+github@gmail.com License: MIT Location: /opt/conda/lib/python3.8/site-packages Requires: torch Required-by:

-> the output of pip show diffusers Name: diffusers Version: 0.15.0.dev0 Summary: Diffusers Home-page: https://github.com/huggingface/diffusers Author: The HuggingFace team Author-email: patrick@huggingface.co License: Apache Location: /opt/conda/lib/python3.8/site-packages Editable project location: /home/pansiyuan/.jupyter/diffusers Requires: filelock, huggingface-hub, importlib-metadata, numpy, Pillow, regex, requests Required-by:

== CompVis/stable-diffusion-v1-4 | 512x512 img generation == [Text Enc] MACs: 6.5G = 6545882112 [Text Enc] Params: 123.1M = 123060480 [U-Net] MACs: 0.2G = 232980480 [U-Net] Params: 859.5M = 859520964 [Img Dec] MACs: 1.0G = 981467136 [Img Dec] Params: 49.5M = 49490179 [Total] MACs: 7.8G = 7760329728 [Total] Params: 1032.1M = 1032071623

bokyeong1015 commented 6 months ago

I think the author may have forgotten to add “/8” for img_size of input of UNet, so the computational complexity of unet is (339G), if "img_size/8" should be the same as mine (0.2G)

We didn't get this point. We think that the division by 8 ("/8") is correctly considered in our code.

If we put img_size=512, the latent input size for the U-Net becomes 1x4x64x64, which is correct.

dummy_noisy_latents = torch.zeros(batch_size, 4, int(img_size/8), int(img_size/8)).to(device)

Futhermore, if we changed img_size as you mentioned, the following results were obtained, and we were not able to reproduce if img_size/8, is 0.2G:

# get_macs_params(model_id="CompVis/stable-diffusion-v1-4", img_size=512, txt_emb_size=768, device=device)

== CompVis/stable-diffusion-v1-4 | 512x512 img generation ==
  [U-Net] MACs: 338.7G = 338749194240
  [U-Net] Params: 859.5M = 859520964

# get_macs_params(model_id="CompVis/stable-diffusion-v1-4", img_size=64, txt_emb_size=768, device=device)

== CompVis/stable-diffusion-v1-4 | 64x64 img generation ==
  [U-Net] MACs: 6.8G = 6773345280
  [U-Net] Params: 859.5M = 859520964

Thanks for your check. Unfortunately, we are unsure if we can provide further assistance at this moment, as we were not able to reproduce the issue you described.

Nota-NetsPresso / BK-SDM

Could the author share the code for calculating the model parameters(Param.) and the model computational complexity(MACs) of the pipeline. #53

@@ -0,0 +1,62 @@

------------------------------------------------------------------------------------

Copyright 2024. Nota Inc. All Rights Reserved.

------------------------------------------------------------------------------------

get_macs_params(model_id="nota-ai/bk-sdm-base", img_size=512, txt_emb_size=768, device=device)