Nota-NetsPresso / BK-SDM

A Compressed Stable Diffusion for Efficient Text-to-Image Generation [ECCV'24]
Other
238 stars 16 forks source link

Scale of KD-feature loss for SD inpainting 1.5 #21

Closed Bikesuffer closed 1 year ago

Bikesuffer commented 1 year ago

Hi there,

I am trying to distill the Unet in SD inpainting 1.5 to a smaller Unet by using your code. (I replaced the pipeline to inpainting and the input data) I have trained for 130K steps with batch size 64. Right now the kd_feat_loss is around 20.

I am wondering what kd_feat_loss you have when you finished distill the Unet in your experiment?

Thank you.

bokyeong1015 commented 1 year ago

Hi, thanks for utilizing our work, glad to know that 😊 Although we haven't attempted inpainting experiments, we hope the following information can be helpful.


Here is a loss curve from our code for text-to-image synthesis, with SD-v1.4 and batch size 64 (= gradient accumulation 4 x mini batch size 16), plotted with 500-point moving average:

loss_curve_batchsz64_230822

bokyeong1015 commented 1 year ago

Please understand that we've changed the name of this issue, 'Batch Size' -> 'Scale of KD-feature loss for SD inpainting 1.5', to clarify the topic and make it easier for people to find in the future.

Bikesuffer commented 1 year ago

Thanks a lot for the information.

yajieC commented 1 year ago

hello, does this method work for SD inpainting 1.5?

bokyeong1015 commented 1 year ago

Hi, @yajieC We haven't tried it, but we believe our models can be used after finetuning for SD-inpainting.

Our models are compressed from SD-v1.4, and SD-v1.x models share the same architecture (with different training recipes); SD-inpainting was based on SD-v1 backbone.

Bikesuffer commented 1 year ago

hello, does this method work for SD inpainting 1.5?

Yes, it worked for me. I have successfully distill the unet in sd inpainting 1.5 to a smaller Unet I would say the SD_base model distilled with batch size 256(I call it IP_Base_256) generate best result for me.

bokyeong1015 commented 1 year ago

Thanks for sharing the above and this good news! Happy to know you are okay with the inpainting results using our approach :) Could we ask if you have plans to release your models and/or code?


Edit: sorry for initial misunderstanding, you've clarified that "distill the unet in sd inpainting 1.5 to a smaller Unet", which means (Teacher, Student) = (SD-inpainting 1.5, BK-SDM modified using additional input channels) <- please let us know if this is incorrect updated. Thanks again for sharing! @Bikesuffer

Bikesuffer commented 1 year ago

Thanks for sharing the above and this good news! Happy to know you are okay with the inpainting results using our approach :) Could we ask if you have plans to release your models and/or code?


Edit: sorry for initial misunderstanding, you've clarified that "distill the unet in sd inpainting 1.5 to a smaller Unet", which means (Teacher, Student) = (SD-inpainting 1.5, BK-SDM) <- please let us know if this is incorrect. Thanks again for sharing! @Bikesuffer

Hi actually the student is a modified version of bk sdm since the input of unet in inpainting pipeline is 9 channel. But all the anchor points for calculating the loss are the same as bk sdm.

bokyeong1015 commented 1 year ago

Thanks for the clarification, and we've updated the student description in the above :)

yajieC commented 12 months ago

hi, I tried this method, but found that the performance was very poor. My experimental configuration was to train on laion_11k data for 10k steps, and the unet is bk_tiny. And I also replaced the pipeline to inpainting and the input data. I would like to ask you for any good suggestions, thanks.

bokyeong1015 commented 12 months ago

@yajieC Thanks for your inquiry. We would like to address this in a separate discussion for making it easier for future readers to find, because it seems a different topic. Please kindly refer to our response at that link.