Multilingual support - Githubissues

scr255 commented 1 year ago

Code for English:

from concept import ConceptModel
concept_model = ConceptModel()
concepts = concept_model.fit_transform(images, docs)
# Works correctly!

Guide suggests "Use Concept(embedding_model="clip-ViT-B-32-multilingual-v1") to select a model that supports 50+ languages.":

from concept import Concept
# ImportError: cannot import name 'Concept' from 'concept' --> I guess you mean to import ConceptModel

Importing ConceptModel:

from concept import ConceptModel
concept_model = ConceptModel(embedding_model="clip-ViT-B-32-multilingual-v1")
concepts = concept_model.fit_transform(images, docs)
# TypeError: 'JpegImageFile' object is not subscriptable

MaartenGr commented 1 year ago

Hmmm, there might be something going wrong with the images that you pass to the model. Did the code for you work with the English version?

scr255 commented 1 year ago

Hmmm, there might be something going wrong with the images that you pass to the model. Did the code for you work with the English version?

Yes, the English model "clip-ViT-B-32" is working fine, while "clip-ViT-B-32-multilingual-v1" throws the error.

I've tried changing the dataset (all images in .jpeg format), and the same problem happens.

MaartenGr commented 1 year ago

Unfortunately, then there seems to be an issue with that specific model processing the images. You could try to embed the images using SentenceTransformers directly and then pass the embeddings to to fit_transform using the parameter image_embeddings. That way, you can also check if there is an issue with a specific image in your dataset.

MaartenGr / Concept

Multilingual support #15