MaartenGr / Concept

Concept Modeling: Topic Modeling on Images and Text
https://maartengr.github.io/Concept/
MIT License
192 stars 16 forks source link

Questions #6

Open erwanlenagard opened 2 years ago

erwanlenagard commented 2 years ago

Hello,

Thank you for sharing you great work. I'd like to have a better understanding of the "fit_transform" function.

How do you intend to use the parameter "image_names" ? For instance, i'd like to classify facebook posts. Does it means that I can pass posts messages with images embeddings to improve topics results ? Can you share any example of code using this parameter ?

Is it possible to return top keywords describing each topic ? As far as I understand your code 'fit_transform' returns only the list of topic predictions.

Thank you very much

MaartenGr commented 2 years ago

The parameter image_names are essentially the paths to the images, such as those facebook images, that you intend to cluster. You can then enrich those clusters with textual information, such as the facebook messages. You can do it like this:

from concept import ConceptModel

concept_model = ConceptModel()
concepts = concept_model.fit_transform(paths_to_my_facebook_images, docs=list_of_my_facebook_messages)

Then, you can visualize the top keywords describing each with concept_model.visualize_concepts(). To get the actual keywords, you can access the concepts through concept_model.topics.

I would advise going through the example in the documentation to get a feel of how the application works.

erwanlenagard commented 2 years ago

Thank you for your explaination. I just don't get the difference between 'images' and 'image_names' parameters in the documentation.

image

xinli2008 commented 2 years ago

hello, @erwanlenagard ! Have you solved the problem of parameter 'images' and 'image_names'? If yes, can you give a code example? Best

MaartenGr commented 2 years ago

Apologies for the very late response! Seems this got lost in my inbox somewhere...

I just don't get the difference between 'images' and 'image_names' parameters in the documentation.

Ah, my apologies, that indeed confuses things! Then my earlier response is not at that accurate. In that response, I was referring to images and not image_names.

The image_names variable actually does not do anything except track some of the image names internally with the intention of extracting representative images for each cluster. In practice, I should have removed that variable as it does not influence the application in any way, and extracting representative images is not yet implemented. For now, you can ignore it!