MaartenGr / Concept

Concept Modeling: Topic Modeling on Images and Text
https://maartengr.github.io/Concept/
MIT License
187 stars 16 forks source link

Multilingual support #15

Open scr255 opened 1 year ago

scr255 commented 1 year ago

Code for English:

from concept import ConceptModel
concept_model = ConceptModel()
concepts = concept_model.fit_transform(images, docs)
# Works correctly!

Guide suggests "Use Concept(embedding_model="clip-ViT-B-32-multilingual-v1") to select a model that supports 50+ languages.":

from concept import Concept
# ImportError: cannot import name 'Concept' from 'concept' --> I guess you mean to import ConceptModel

Importing ConceptModel:

from concept import ConceptModel
concept_model = ConceptModel(embedding_model="clip-ViT-B-32-multilingual-v1")
concepts = concept_model.fit_transform(images, docs)
# TypeError: 'JpegImageFile' object is not subscriptable
MaartenGr commented 1 year ago

Hmmm, there might be something going wrong with the images that you pass to the model. Did the code for you work with the English version?

scr255 commented 1 year ago

Hmmm, there might be something going wrong with the images that you pass to the model. Did the code for you work with the English version?

Yes, the English model "clip-ViT-B-32" is working fine, while "clip-ViT-B-32-multilingual-v1" throws the error.

I've tried changing the dataset (all images in .jpeg format), and the same problem happens.

MaartenGr commented 1 year ago

Unfortunately, then there seems to be an issue with that specific model processing the images. You could try to embed the images using SentenceTransformers directly and then pass the embeddings to to fit_transform using the parameter image_embeddings. That way, you can also check if there is an issue with a specific image in your dataset.