MaartenGr / Concept

Concept Modeling: Topic Modeling on Images and Text
https://maartengr.github.io/Concept/
MIT License
192 stars 16 forks source link

discussion on different concepts results #20

Closed bakachan19 closed 1 year ago

bakachan19 commented 1 year ago

Hi.

Thank you for this library. It is really helpful. I am using concept modeling to cluster images and do some analysis on the results. I modified the use of find_concepts() ( that initially was meant to find the top 5 related concepts based on a search term) to find the top 5 related concepts given an image (by simply passing the path to an image and obtain the embeddings of the image with the embedding model). However I noticed that in many cases the top-1 most related cluster is different from the cluster that is returned by fit_transform(). Sometimes the concept is in second position, but in many cases it is in positions >2. Any idea on why this might be happening?

Thank you for your time. Best wishes.

MaartenGr commented 1 year ago

The find_concepts function is merely a quick search function and does not behave the same way .transform does. .find_concepts applies a cosine similarity between image and concept embeddings to quickly find a match. However, this is not an exact representation of the training process during .fit which involves clustering and dimensionality reduction.

bakachan19 commented 1 year ago

Ohh, I see. Thank you!