Scale of KD-feature loss for SD inpainting 1.5

Nota-NetsPresso / BK-SDM

A Compressed Stable Diffusion for Efficient Text-to-Image Generation [ECCV'24]

Other

238 stars 16 forks source link

Scale of KD-feature loss for SD inpainting 1.5 #21

Closed Bikesuffer closed 1 year ago

Bikesuffer commented 1 year ago

Hi there,

I am trying to distill the Unet in SD inpainting 1.5 to a smaller Unet by using your code. (I replaced the pipeline to inpainting and the input data) I have trained for 130K steps with batch size 64. Right now the kd_feat_loss is around 20.

I am wondering what kd_feat_loss you have when you finished distill the Unet in your experiment?

Thank you.

bokyeong1015 commented 1 year ago

Hi, thanks for utilizing our work, glad to know that 😊 Although we haven't attempted inpainting experiments, we hope the following information can be helpful.

Here is a loss curve from our code for text-to-image synthesis, with SD-v1.4 and batch size 64 (= gradient accumulation 4 x mini batch size 16), plotted with 500-point moving average:

loss_curve_batchsz64_230822

The scale of KD feature loss ≫ The scale of KD output loss and SD task loss
- As we described in our paper, we didn’t try hyperparameter tuning for loss weights, but it empirically worked well in our experiments.
Losses are not directly correlated with the final generation scores (FID/IS/CLIP score), especially in later iterations. In other words, lower losses did not necessarily result in better generation scores.
If you want to verify the learning process, we suggest examining the final metrics and/or visual examples. Nevertheless, the losses should decrease during initial iterations.

bokyeong1015 commented 1 year ago

Please understand that we've changed the name of this issue, 'Batch Size' -> 'Scale of KD-feature loss for SD inpainting 1.5', to clarify the topic and make it easier for people to find in the future.

Bikesuffer commented 1 year ago

Thanks a lot for the information.

yajieC commented 1 year ago

hello, does this method work for SD inpainting 1.5?

bokyeong1015 commented 1 year ago

Hi, @yajieC We haven't tried it, but we believe our models can be used after finetuning for SD-inpainting.

Our models are compressed from SD-v1.4, and SD-v1.x models share the same architecture (with different training recipes); SD-inpainting was based on SD-v1 backbone.

Bikesuffer commented 1 year ago

hello, does this method work for SD inpainting 1.5?

Yes, it worked for me. I have successfully distill the unet in sd inpainting 1.5 to a smaller Unet I would say the SD_base model distilled with batch size 256(I call it IP_Base_256) generate best result for me.

bokyeong1015 commented 1 year ago

Thanks for sharing the above and this good news! Happy to know you are okay with the inpainting results using our approach :) Could we ask if you have plans to release your models and/or code?

Edit: sorry for initial misunderstanding, you've clarified that "distill the unet in sd inpainting 1.5 to a smaller Unet", which means (Teacher, Student) = (SD-inpainting 1.5, BK-SDM modified using additional input channels) <- ~~please let us know if this is incorrect~~ updated. Thanks again for sharing! @Bikesuffer

Bikesuffer commented 1 year ago

Thanks for sharing the above and this good news! Happy to know you are okay with the inpainting results using our approach :) Could we ask if you have plans to release your models and/or code?

Edit: sorry for initial misunderstanding, you've clarified that "distill the unet in sd inpainting 1.5 to a smaller Unet", which means (Teacher, Student) = (SD-inpainting 1.5, BK-SDM) <- please let us know if this is incorrect. Thanks again for sharing! @Bikesuffer

Hi actually the student is a modified version of bk sdm since the input of unet in inpainting pipeline is 9 channel. But all the anchor points for calculating the loss are the same as bk sdm.

bokyeong1015 commented 1 year ago

Thanks for the clarification, and we've updated the student description in the above :)

yajieC commented 12 months ago

hi, I tried this method, but found that the performance was very poor. My experimental configuration was to train on laion_11k data for 10k steps, and the unet is bk_tiny. And I also replaced the pipeline to inpainting and the input data. I would like to ask you for any good suggestions, thanks.

bokyeong1015 commented 12 months ago

@yajieC Thanks for your inquiry. We would like to address this in a separate discussion for making it easier for future readers to find, because it seems a different topic. Please kindly refer to our response at that link.