Junyi42 / sd-dino

Official Implementation of paper "A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence"
https://sd-complements-dino.github.io
271 stars 14 forks source link

Questions about sd features #10

Closed sixi111 closed 1 month ago

sixi111 commented 11 months ago

Hello, I would like to know whether the 2, 5, 8-layer features mentioned in the paper refer to the actual 2, 5, 8 layers or the layers after processing with the UpSample block. Does it mean the results obtained after the UpSample block processing? I find it a bit challenging to understand the feature extraction in the code. I hope to receive your reply. Thank you!

sixi111 commented 11 months ago

Hello, I would like to know whether the 2, 5, 8-layer features mentioned in the paper refer to the actual 2, 5, 8 layers or the layers after processing with the UpSample block. Does it mean the results obtained after the UpSample block processing? I find it a bit challenging to understand the feature extraction in the code. I hope to receive your reply. Thank you!

I'm sorry for the oversight. I found a deficiency in my expression. By the layers processed through the upsampling module, I mean the layers counted from 1, specifically, layers 3, 6, and 9.

Junyi42 commented 10 months ago

Hi,

Thanks for your interest in our work!

The index for the layer (2, 5, 8) is parsed into the load_model() function in the extractor_sd.py file and then serve as the argument unet_block_indices for the feature extractor.

Then, you can check the exact implementation of how the features are extracted in the file ldm.py by searching for the key unet_block_indices.

Hope this could be helpful and feel free to put any questions for more details!