Open deter3 opened 9 months ago
Hey, could you provide more of your code? It looks like the issue here is that the documents
that makes it to the preprocessor is None, so would be helpful to figure out how that happened!
@bclavie - I faced a similar issue when I ran the notebook 06-index_free_use.ipynb from examples
.
Tried to create a reproducer in the code using Colab and faced
ValidationError: 1 validation error for Document
text
none is not an allowed value (type=type_error.none.not_allowed)
The root cause is due to one of the page being empty and hence ragatouille is throwing the error as "ValidationError" which is the right behavior.
The user need to ensure passing only valid docs before passing to Corpus_Processor.process_corpus
method. This issue is not a bug and can be closed.
ragatouille 0.0.4b2 , ubuntu 22.04
when I using the sample code to run , documents is just a list of string .
Traceback (most recent call last): File "/workspace/three_methods_ranking2.py", line 160, in
my_documents = processor.process_corpus(documents)
File "/usr/local/lib/python3.10/dist-packages/ragatouille/data/corpus_processor.py", line 22, in process_corpus
documents = self.document_splitter_fn(documents, **splitter_kwargs)
File "/usr/local/lib/python3.10/dist-packages/ragatouille/data/preprocessors.py", line 9, in llama_index_sentence_splitter
docs = [[Document(text=doc)] for doc in documents]
TypeError: 'NoneType' object is not iterable