MaartenGr / Concept

Concept Modeling: Topic Modeling on Images and Text
https://maartengr.github.io/Concept/
MIT License
187 stars 16 forks source link

How can we get probabilities for all clusters in transform function? #17

Open suprateek-19 opened 1 year ago

suprateek-19 commented 1 year ago

Currently we only get the predicted class through concept_model.transform() Can we get the predicted probabilities for each cluster or the top n clusters?

MaartenGr commented 1 year ago

That is currently not implemented. However, you can use the internal hdbscan model (concept_model.hdbscan_model) to extract the probabilities using its approximate_predict or hdbscan.membership_vector functions.

shilpiag123 commented 1 year ago

I get following error while trying to access above which is a known issue too. Any other way to get probability distribution across concepts for images? AttributeError: 'HDBSCAN' object has no attribute 'approximate_predict'

MaartenGr commented 1 year ago

@shilpiag123 You should use the it as follows:

import hdbscan
probabilities = hdbscan.membership_vector(cluster_model, embeddings)

Having said that, you will have to access the cluster model and also pre-calculate the embeddings. Instead, I would advise using BERTopic v0.15 instead which how now support for topic modeling with images very similar to Concept.