MaartenGr / Concept

Concept Modeling: Topic Modeling on Images and Text
https://maartengr.github.io/Concept/
MIT License
187 stars 16 forks source link

Index Error: index out of bounds error for visualize concepts #10

Open vvkishere opened 2 years ago

vvkishere commented 2 years ago

I ran the sample code for a custom data set and got the following error when I tried to visualize the concepts. Any help here would be appreciated. image

MaartenGr commented 2 years ago

Most likely, you are using a top_n that is smaller than the actual number of concepts in your data. I would advise using concept_model.visualize_concepts(top_n=X) where X is equal or smaller than the number of concepts in your data.

vvkishere commented 2 years ago

Thank you. I shall check this and get back to you.

vvkishere commented 2 years ago

Hi.

I am now getting this error on a dataset of around 100 jpg images.


ValueError Traceback (most recent call last) in () ----> 1 clusters = concept_model.fit_transform(images = img_names, image_embeddings = concepts)

5 frames /usr/local/lib/python3.7/dist-packages/concept/_model.py in fit_transform(self, images, docs, image_names, image_embeddings) 129 representative_images) 130 selected_exemplars = self._extract_exemplar_subset(exemplar_embeddings, --> 131 representative_images) 132 133 # Create collective representation of images

/usr/local/lib/python3.7/dist-packages/concept/_model.py in _extract_exemplar_subset(self, exemplar_embeddings, representative_images) 358 diversity=self.diversity, 359 top_n=8) --> 360 for index, cluster in enumerate(self.cluster_labels[1:])} 361 362 return selected_exemplars

/usr/local/lib/python3.7/dist-packages/concept/_model.py in (.0) 358 diversity=self.diversity, 359 top_n=8) --> 360 for index, cluster in enumerate(self.cluster_labels[1:])} 361 362 return selected_exemplars

/usr/local/lib/python3.7/dist-packages/concept/_mmr.py in mmr(cluster_embedding, image_embeddings, indices, top_n, diversity) 45 # Calculate MMR 46 mmr = (1-diversity) candidate_similarities - diversity target_similarities.reshape(-1, 1) ---> 47 mmr_idx = candidates_idx[np.argmax(mmr)] 48 49 # Update images & candidates

<__array_function__ internals> in argmax(*args, **kwargs) [/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py](https://localhost:8080/#) in argmax(a, axis, out) 1193 1194 """ -> 1195 return _wrapfunc(a, 'argmax', axis=axis, out=out) 1196 1197 [/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py](https://localhost:8080/#) in _wrapfunc(obj, method, *args, **kwds) 55 56 try: ---> 57 return bound(*args, **kwds) 58 except TypeError: 59 # A TypeError occurs if the object does have such a method in its ValueError: attempt to get argmax of an empty sequence Any idea how I can fix this? It seems like the cluster labels are not getting generated or is an empty list. Thank you.
MaartenGr commented 2 years ago

I am not entirely sure, but I believe this is a result of having a small dataset. When you want to diversify the concepts, I think you will need to have min_concept_size to be at least 8 since it is using the top 8 to do so. Could you share how you have initialized Concept?

vvkishere commented 2 years ago

I initialized it using the sample code really. I think the number of images were low (around 100). Do you think that is the issue? I tried reducing the min_concept_size to 5 but it doesn't seem to be working. I think it is because the top_n attribute in selected_exemplars is running into some error. From the code I don't think I can influence that particular attribute via the API.

MaartenGr commented 2 years ago

It indeed might be that the number of images is relatively low and I did not account for that in all places. Seeing as we want to extract representative pictures there need be a minimum of 8-10 images in each cluster. My guess then would be that min_concept_size should be at minimum 10.

vvkishere commented 2 years ago

Okay got it. Thank you.