I'm creating an API with Flask. The other side will send me a file and I will save it to chroma database on my side. Chroma.add will terminates my program without any exception. When I save a smaller file to it, it will be fine, when send a larger file it will crash. Firstly, I thought it might be memory problem, and I tested the same code in jupyter notebook outside flask. When I run the same code in jupyter notebook, it will run properly.
def save_w_chunking(self, docs: List[Document]) -> None:
text_splitter = SemanticChunker(self._embeddings, breakpoint_threshold_type = "percentile", breakpoint_threshold_amount = 80, sentence_split_regex = r'(?<=[。?!])|(?<=\n)')
docs = text_splitter.split_documents(docs)
seen_docs = []
temp_docs = []
for d in docs:
is_unique = d.page_content not in seen_docs
has_content = len(d.page_content.strip().strip("\n")) > 0
if is_unique and has_content:
seen_docs.append(d.page_content)
d.page_content = d.metadata["filename"] + ":\n" + d.page_content
temp_docs.append(d)
docs = temp_docs
docs = filter_complex_metadata(docs)
if len(docs) == 0:
return
try:
t = [d.page_content for d in docs]
m = [d.metadata for d in docs]
ids = [str(uuid.uuid4()) for _ in range(len(t))]
self._ChromaDB.add(ids = ids,
documents = t,
metadatas = m)
except Exception as e:
print("caught exception: ", e)
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
Press CTRL+C to quit
successfully parsed hello.docx
(llm) C:\Users\Desktop>
What happened?
I'm creating an API with Flask. The other side will send me a file and I will save it to chroma database on my side. Chroma.add will terminates my program without any exception. When I save a smaller file to it, it will be fine, when send a larger file it will crash. Firstly, I thought it might be memory problem, and I tested the same code in jupyter notebook outside flask. When I run the same code in jupyter notebook, it will run properly.
Versions
python 3.12.3 chromadb 0.5.0 langchain-chroma 0.1.1
Relevant log output