Open mjaved-nz opened 9 months ago
It seems like it cannot iterate over the documents for whatever reason. Did you make sure that all documents are no-empty? That also means that very short documents that contain for instance "\n"
should also be removed.
I don't have any empty documents the minimum length of the documents is 21. The same set of documents works fine with other LLMs.
I don't have any empty documents the minimum length of the documents is 21.
How did you calculate the length of the document? Tokenization schemes of the underlying model might handle certain documents differently.
The same set of documents works fine with other LLMs.
Which other LLMs did you try? Did it work with TextGeneration
or something else?
Also, on how many documents did you train your model? It might be that there are only a couple of documents per topic and that it might not properly return a document.
Hi @MaartenGr - thank you for creating this fantastic library.
I think the cause is that when the DEFAULT_PROMPT is used (which has no [DOCUMENTS]) or a user-supplied prompt does not contain "[DOCUMENTS]", the docs in repr_docs_mappings are all assigned a value of None. The error occurs when trying to iterate over None.
I created a pull request to address this. Please review and merge if you find this to be a suitable fix.
Hi @MaartenGr,
I hope you are doing well. I am getting the following error when using the flan-t5 model for topic representation. Any solution for this? Thanks
Error:
----> 2 topics, probs = topic_model_t5.fit_transform(docs) 3 print(topic_model_t5.get_topic_info())
/usr/local/lib/python3.10/dist-packages/bertopic/representation/_textgeneration.py in extract_topics(self, topic_model, documents, c_tf_idf, topics) 145 146 # Prepare prompt --> 147 truncated_docs = [truncate_document(topic_model, self.doc_length, self.tokenizer, doc) for doc in docs] 148 prompt = self._create_prompt(truncateddocs, topic, topics) 149 self.prompts.append(prompt)
TypeError: 'NoneType' object is not iterable