Closed amrakm closed 1 year ago
Apologies for the late reply! It seems that there were no outliers found, which happens very rarely. I'll make sure that it gets fixed!
I just pushed a fix to the main branch which should hopefully solve your issue!
Hi @MaartenGr.
I tried to use concept with Google colab. I did pip install and the concept version is 0.2.1. I still get the KeyError: '[-1] not found in axis'
error when I use a particular dataset. Any ideas on what might be the issue?
Thank you for your time and help.
@bakachan19 If you install it through the main branch, it should have the fix for the error you are getting.
Oh, I see. Thanks a lot @MaartenGr.
Hi @MaartenGr. I apologize for bothering you again. I did install the concept package through the main branch and making sure the the scikit-learn version is compatible. I do not get the previous error anymore, but I do get several different ones depending of the size of the concept. I am using the default concept model configuration, I only change the min_concept_size.
56 try:
---> 57 return bound(*args, **kwds)
58 except TypeError:
59 # A TypeError occurs if the object does have such a method in its
ValueError: attempt to get argmax of an empty sequence
- with min_concept_size = 10, I get this one:
/usr/local/lib/python3.9/dist-packages/concept/_model.py in
354
--> 355 selected_exemplars = {cluster: mmr(self.cluster_embeddings[cluster],
356 exemplar_embeddings[cluster],
357 representative_images[cluster]["Indices"],
IndexError: list index out of range
Thank you for your time and help.
@bakachan19 Strange, I am not entirely sure what is happening. Could you share your full code and the versions of packages in your environment? I will look into this but just in the meantime, there is an option to use images with BERTopic that should provide similar, albeit not the same, functionality.
@MaartenGr I did managed to make it work with different configuration of UMAP: by changing the nr_neighbors from 15 to a smaller number like 5 I was able to run the code with min_concept_size = 10. I think because my data is particular and with some configurations it does not found any clusters or maybe it clusters everything together... For the environment setup I use google colab with the following installation steps:
pip install scikit-learn==0.24.2
pip install git+https://github.com/MaartenGr/Concept.git
and then I just used the code provided in the tutorial:
from concept import ConceptModel
from umap import UMAP
concept_model = ConceptModel(min_concept_size = 10, umap_model = UMAP(n_neighbors=5, n_components=5, min_dist=0.0, metric='cosine', random_state = 5, low_memory = False))
concepts = concept_model.fit_transform(images_name, docs=all_nouns)
Thank you for your time! Have a great day.
Glad to hear that you solved the issue and thanks for sharing your solution. This will definitely help others having the same issue.
I tried the demo code and it worked for a small sample, tried to feed it more images and I got this error
KeyError: '[-1] not found in axis'
dependencies: concept=='0.2.1' pandas=1.4.0