Open TyroneLi opened 1 year ago
About 'K-Means Clustering of Frozen Diffusion Features', how do you perform on the dataset? Because the LDM model accept the text input to generate the new image samples, and what do you input to obtain which layers' latent feature map and how do you perform the k-menas cluster? Great thanks.
I guess this idea derives from paper F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
About 'K-Means Clustering of Frozen Diffusion Features', how do you perform on the dataset? Because the LDM model accept the text input to generate the new image samples, and what do you input to obtain which layers' latent feature map and how do you perform the k-menas cluster? Great thanks.
I guess this idea derives from paper F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
This paper is so similar to F-VLM that the moment I saw "K-Means Clustering of Frozen Diffusion Features," it just kept beeping in my head, lol. Nonetheless, it's still excellent work.
just wondering is there any updates on this? Anyone able to reproduce the mid-figure below? Any help would be appreciated!
Did someone reproduce this?
Same question here.
any updates?
I have the same question
+1
+1
I would like to discuss with everyone how this part of the ODISE paper is implemented:
Currently, my approach is as follows in the code:
In the code, 'cfeatures' refers to the feature pyramid extracted by stable diffusion. I have fused the features at various levels of this feature pyramid. The approach is roughly based on "Panoptic Feature Pyramid Networks."
Currently, the results of using KMeans clustering in combination with the feature pyramid extracted by stable diffusion are (please ignore the text labels):
The clustering results obtained solely by using KMeans are:
How can I improve the result?
About 'K-Means Clustering of Frozen Diffusion Features', how do you perform on the dataset? Because the LDM model accept the text input to generate the new image samples, and what do you input to obtain which layers' latent feature map and how do you perform the k-menas cluster? Great thanks.