edgestorejs / edgestore

https://edgestore.dev
402 stars 24 forks source link

Internal Sever Error #63

Open Arslan-Soomro opened 3 days ago

Arslan-Soomro commented 3 days ago

I get "internal server error" only when querying one of the buckets, I have multiple buckets and they work all fine. This particular bucket has been working great before, all of a sudden it just started giving me this error.

Error Log:

image

Edgestore Dashboard:

image
perfectbase commented 3 days ago

Hi, @Arslan-Soomro! Does that happen with any query? Or is it some specific query? I'll try to take a look at it today and get back to you.

perfectbase commented 3 days ago

@Arslan-Soomro I found your data, but I couldn't figure out the problem yet. Here are a few things you can share to make it easier to debug:

Arslan-Soomro commented 3 days ago

Hi, @Arslan-Soomro! Does that happen with any query? Or is it some specific query? I'll try to take a look at it today and get back to you.

I am not sure about the query, I only query all files from this bucket, so right now list-files is the the only query I am performing from my side.

Arslan-Soomro commented 3 days ago

@Arslan-Soomro I found your data, but I couldn't figure out the problem yet. Here are a few things you can share to make it easier to debug:

  • Set the logger to "debug" and share the server-side logs.
  • Share the exact time that the error happened so I can search it in the logs
  • Share your bucket configuration (the edgestore router file)
  • Share the query that you are trying to run
  1. Server Side Logs

    image
  2. Exact Time Sun Nov 10 2024 00:42:26 GMT+0500

  3. Bucket Configuration (for the erroneous bucket only, there are more buckets in configuration though)

    const edgeStoreRouter = es.router({
    adminLegalKbFiles: es
    .fileBucket({
      maxSize: 10 * 1024 * 1024, // 10MB
      accept: ["text/plain"],
    })
    .input(
      z.object({
        category: z.string(),
        name: z.string(),
        fileText: z.string(),
        id: z.string(), // This is the id of the file
        tokens: z.number(),
      })
    )
    // .path(() => [])
    .metadata(({ ctx, input }) => ({
      id: input.id,
      category: input.category,
      name: input.name,
      tokens: input.tokens.toString(),
    }))
    .beforeUpload(async ({ ctx, input, fileInfo }) => {
      console.log(
        `[edgestore-adminLegalKbFiles] beforeUpload, Name: ${input.name}, Tokens: ${input.tokens}`
      );
      if (!ctx.adminId) return false;
    
      const { fileText } = input;
    
      const res = await embedAndStoreText(
        fileText,
        {
          fileId: input.id,
          category: input.category,
          fileName: input.name,
          fileBucketName: "adminLegalKbFiles",
        },
        { namespace: "admin" }
      );
    
      if (res.error) {
        return false;
      }
    
      return true;
    })
    .beforeDelete(async ({ ctx, fileInfo }) => {
      console.log("[edgestore-adminLegalKbFiles] beforeDelete", ctx, fileInfo);
      if (!ctx.adminId) return false;
    
      // Delete Associated Embeddings
      const res = await deleteEmbeddingsByFileId(fileInfo.metadata.id, "admin");
      if (res.error) return false;
      return true;
    }),
    })
  4. Query

    const res = await esBackendClient["adminLegalKbFiles"].listFiles({
      pagination: {
        currentPage,
        pageSize: 100,
      },
    });
perfectbase commented 2 days ago

@Arslan-Soomro I figured out the issue, and I’m sorry for the hassle.

Basically, it’s a query performance issue, and your query is timing out. This seems to happen when the bucket has a lot of files, multiple path parameters and metadata keys. I played around with the query and found three things I can do to speed it up. I’m aiming to get the first improvement out soon. Hopefully today. With that change, this specific query (since it doesn’t use metadata or path params) should perform a lot better.

The main reason this query’s a bit tricky is because of the flexibility to create any metadata key/value pairs. To search efficiently with that setup in MySQL, I have to do some 1xN joins, which can get heavy. I’m planning to eventually move the files to a different database for better flexible query performance. I’ll try MongoDB first since it lets us create a flexible JSON field with all keys indexed. If that doesn’t work, I’ll go with ElasticSearch, which I know can handle this well. Moving to a new DB is a future enhancement, though, and I don’t have a release date for that just yet.

perfectbase commented 2 days ago

@Arslan-Soomro I just released the first improvement. Can you check if your issue is fixed for that query? It should be greatly improved. (from what I checked, your query was taking about 15s, now it should be less then 1s)

You might still run into some issues when filtering with metadata and path params. The main problem in this case is the query to count the total number of files. I think I'll add a flag to ignore the totalCount, so that people can make more complex queries without running into timeout problems.