Closed TivonJJ closed 2 months ago
But the finalDocs
are the splitted chunks, it shouldnt be the culprit thats causing 413 Request Entity Too Large
But the
finalDocs
are the splitted chunks, it shouldnt be the culprit thats causing 413 Request Entity Too Large
But the length of the chunks is still a problem.such as chroma has a maximum length of 166, I printed the length of finalDocs is 419
I tried to change the following code and the problem was solved, but I don't know if there is any problem with this cure.
packages/components/nodes/vectorstores/Chroma/Chroma.ts > upsert
// try {
// await ChromaExtended.fromDocuments(finalDocs, embeddings, obj)
// } catch (e) {
// throw new Error(e)
// }
// Modify to the following
// Split array
function chunk(array: any[], size: number) {
const chunkedArray: any[] = []
for (let i = 0; i < array.length; i += size) {
chunkedArray.push(array.slice(i, i + size))
}
return chunkedArray
}
const chunkedArray = chunk(finalDocs, 100)
for (const arr of chunkedArray) {
try {
await ChromaExtended.fromDocuments(arr, embeddings, obj)
} catch (e) {
throw new Error(e)
}
}
This cyclic storage avoids both excessive length and large every requests(It's because it's been split up into smaller pieces.)
Describe the bug I'm using Cheerio Web Scraper to fetch a website's content, and then I'm getting an error when storing it into a vector library. Got error "exceeds maximum batch" from chroma db.
I checked the information and found that the vector libraries all have limitations, Like chroma,pinecone... I tried to reduce the number of chunks and increase the size of every chunks, but I got a new error: "413 Request Entity Too Large" from vector db.
The I checked the code in
packages/components/nodes/vectorstores/Chroma/Chroma.ts
.I found this finalDocs too big. Because a website's knowledge base can be very large,Is it possible to split into multiple chunks for cyclic storage here? Or is there any other way to solve this problem?
Screenshots
The flow
Chroma logs