NVlabs / ODISE

Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]
https://arxiv.org/abs/2303.04803
Other
860 stars 49 forks source link

Can you share some implementation details about the result about 'K-Means Clustering of Frozen Diffusion Features'?? #3

Open TyroneLi opened 1 year ago

TyroneLi commented 1 year ago

About 'K-Means Clustering of Frozen Diffusion Features', how do you perform on the dataset? Because the LDM model accept the text input to generate the new image samples, and what do you input to obtain which layers' latent feature map and how do you perform the k-menas cluster? Great thanks.

KinGeorge commented 1 year ago

About 'K-Means Clustering of Frozen Diffusion Features', how do you perform on the dataset? Because the LDM model accept the text input to generate the new image samples, and what do you input to obtain which layers' latent feature map and how do you perform the k-menas cluster? Great thanks.

I guess this idea derives from paper F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

BingliangLi commented 1 year ago

About 'K-Means Clustering of Frozen Diffusion Features', how do you perform on the dataset? Because the LDM model accept the text input to generate the new image samples, and what do you input to obtain which layers' latent feature map and how do you perform the k-menas cluster? Great thanks.

I guess this idea derives from paper F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

This paper is so similar to F-VLM that the moment I saw "K-Means Clustering of Frozen Diffusion Features," it just kept beeping in my head, lol. Nonetheless, it's still excellent work.

Tsingularity commented 1 year ago

just wondering is there any updates on this? Anyone able to reproduce the mid-figure below? Any help would be appreciated!

image
JayKarhade commented 1 year ago

Did someone reproduce this?

zgzxy001 commented 1 year ago

Same question here.

yxchng commented 1 year ago

any updates?

Neyleer commented 1 year ago

I have the same question

nhw649 commented 9 months ago

+1

jakub-prokop commented 8 months ago

+1

jianghongjie328 commented 2 months ago

I would like to discuss with everyone how this part of the ODISE paper is implemented: image

Currently, my approach is as follows in the code: image

In the code, 'cfeatures' refers to the feature pyramid extracted by stable diffusion. I have fused the features at various levels of this feature pyramid. The approach is roughly based on "Panoptic Feature Pyramid Networks."

Currently, the results of using KMeans clustering in combination with the feature pyramid extracted by stable diffusion are (please ignore the text labels): image

The clustering results obtained solely by using KMeans are: image

How can I improve the result?