Open Arslan-Soomro opened 3 days ago
Hi, @Arslan-Soomro! Does that happen with any query? Or is it some specific query? I'll try to take a look at it today and get back to you.
@Arslan-Soomro I found your data, but I couldn't figure out the problem yet. Here are a few things you can share to make it easier to debug:
Hi, @Arslan-Soomro! Does that happen with any query? Or is it some specific query? I'll try to take a look at it today and get back to you.
I am not sure about the query, I only query all files from this bucket, so right now list-files is the the only query I am performing from my side.
@Arslan-Soomro I found your data, but I couldn't figure out the problem yet. Here are a few things you can share to make it easier to debug:
- Set the logger to "debug" and share the server-side logs.
- Share the exact time that the error happened so I can search it in the logs
- Share your bucket configuration (the edgestore router file)
- Share the query that you are trying to run
Server Side Logs
Exact Time Sun Nov 10 2024 00:42:26 GMT+0500
Bucket Configuration (for the erroneous bucket only, there are more buckets in configuration though)
const edgeStoreRouter = es.router({
adminLegalKbFiles: es
.fileBucket({
maxSize: 10 * 1024 * 1024, // 10MB
accept: ["text/plain"],
})
.input(
z.object({
category: z.string(),
name: z.string(),
fileText: z.string(),
id: z.string(), // This is the id of the file
tokens: z.number(),
})
)
// .path(() => [])
.metadata(({ ctx, input }) => ({
id: input.id,
category: input.category,
name: input.name,
tokens: input.tokens.toString(),
}))
.beforeUpload(async ({ ctx, input, fileInfo }) => {
console.log(
`[edgestore-adminLegalKbFiles] beforeUpload, Name: ${input.name}, Tokens: ${input.tokens}`
);
if (!ctx.adminId) return false;
const { fileText } = input;
const res = await embedAndStoreText(
fileText,
{
fileId: input.id,
category: input.category,
fileName: input.name,
fileBucketName: "adminLegalKbFiles",
},
{ namespace: "admin" }
);
if (res.error) {
return false;
}
return true;
})
.beforeDelete(async ({ ctx, fileInfo }) => {
console.log("[edgestore-adminLegalKbFiles] beforeDelete", ctx, fileInfo);
if (!ctx.adminId) return false;
// Delete Associated Embeddings
const res = await deleteEmbeddingsByFileId(fileInfo.metadata.id, "admin");
if (res.error) return false;
return true;
}),
})
Query
const res = await esBackendClient["adminLegalKbFiles"].listFiles({
pagination: {
currentPage,
pageSize: 100,
},
});
@Arslan-Soomro I figured out the issue, and I’m sorry for the hassle.
Basically, it’s a query performance issue, and your query is timing out. This seems to happen when the bucket has a lot of files, multiple path parameters and metadata keys. I played around with the query and found three things I can do to speed it up. I’m aiming to get the first improvement out soon. Hopefully today. With that change, this specific query (since it doesn’t use metadata or path params) should perform a lot better.
The main reason this query’s a bit tricky is because of the flexibility to create any metadata key/value pairs. To search efficiently with that setup in MySQL, I have to do some 1xN joins, which can get heavy. I’m planning to eventually move the files to a different database for better flexible query performance. I’ll try MongoDB first since it lets us create a flexible JSON field with all keys indexed. If that doesn’t work, I’ll go with ElasticSearch, which I know can handle this well. Moving to a new DB is a future enhancement, though, and I don’t have a release date for that just yet.
@Arslan-Soomro I just released the first improvement. Can you check if your issue is fixed for that query? It should be greatly improved. (from what I checked, your query was taking about 15s, now it should be less then 1s)
You might still run into some issues when filtering with metadata and path params. The main problem in this case is the query to count the total number of files. I think I'll add a flag to ignore the totalCount, so that people can make more complex queries without running into timeout problems.
I get "internal server error" only when querying one of the buckets, I have multiple buckets and they work all fine. This particular bucket has been working great before, all of a sudden it just started giving me this error.
Error Log:
Edgestore Dashboard: