LlamaEdge / rag-api-server

A RAG API server written in Rust following OpenAI specs
https://llamaedge.com/docs/user-guide/server-side-rag/quick-start
Apache License 2.0
21 stars 7 forks source link

Feature: Search capability #22

Open suryyyansh opened 2 weeks ago

suryyyansh commented 2 weeks ago

Pertaining WasmEdge #3504

This PR aims to add search capabilities to the RAG API Server. This functionality will be enabled through the search feature. It will also leverage the llamaedge-query-server to deliver nuanced search queries and provide the RAG API Server something to fall back on in case the RAG fails due to no RAG query matches.

juntao commented 2 weeks ago

Hello, I am a code review agent on flows.network. Here are my reviews of changed source code files in this PR.


Cargo.lock

Potential issues

N/A

Summary of changes

The following file has been updated with version changes for several packages:

  1. Significant Update: The "llama-core" package has been upgraded from version 0.16.0 to version 0.16.1. This update might involve new features, bug fixes or performance improvements.

  2. Moderate Update: Several packages have been updated to newer versions such as "bytemuck" (from 1.17.0 to 1.17.1), "cc" (from 1.1.14 to 1.1.15), and "chat-prompts" (from 0.12.0 to 0.13.0). These updates might include bug fixes, security patches or new features.

  3. Minor Update: Some packages have been updated for compatibility or dependency management such as "indexmap" (from 2.4.0 to 2.5.0), "object" (from 0.36.3 to 0.36.4), and "rgb" (from 0.8.48 to 0.8.50). These updates are less likely to contain significant changes but they're important for maintaining the stability and functionality of the system.

Cargo.toml

Potential issues

Issue 1: The version constraint "=0.13.0" for chat-prompts and "=0.13.1" for endpoints is a potential problem, as it might cause compatibility issues if these crates are updated to newer versions in the future.

Issue 2: The "reqwest" crate is patched from an external repository, which could introduce unexpected behaviors or security vulnerabilities into the project. It's recommended to use the official version of reqwest unless necessary and well-tested.

Issue 3: The "socket2" crate is also patched from an external repository. This can lead to unexpected behavior or issues with the networking layer of the application, as this crate is used by many other crates in the dependency graph.

Summary of changes

  1. The version of the chat-prompts dependency has been updated from "=0.12.0" to "=0.13.0".

  2. The version of the llama-core dependency has been updated from "=0.16.0" to "=0.16.1".

  3. A new feature called "search" has been added. However, without further context or information about this feature, it's hard to summarize its importance and changes.

src/backend/ggml.rs

Potential issues

Issue 1: There's no error handling for the case when llama_core::models::models().await returns an error. This could result in a panic if not properly handled, as the response is unwrapped even though it might contain an error.

Issue 2: The server does not handle scenarios where deserializing requests or serializing responses may fail. If these operations return an error, they are logged and then ignored, leading to potential data loss or incorrect behavior.

Issue 3: The code doesn't check if the file name is present when uploading a file in files_handler. If the filename is not provided in the request headers, it directly returns an internal server error without proper handling. This could result in unexpected behavior for clients that don't provide a filename or provide an invalid one.

Summary of changes

This patch introduces a new feature that enables web search functionality if the query doesn't yield relevant results from the database. This is controlled by a search feature flag. The code changes include storing user queries under this feature for potential web searches and enabling web search when no points are retrieved from the database during error handling, with relevant logging messages.

src/main.rs

Potential issues

This code appears to be a Rust implementation of an API server that integrates with LlamaEdge-RAG. Below are the issues found in the provided source code, as per your request:

  1. Inconsistent Handling of Command Line Arguments: The code checks if certain command line arguments, such as model_name, model_alias, ctx_size, batch_size, and prompt_template, have exactly two elements in the vector. If not, an error is returned. However, other important parameters like n_predict, n_gpu_layers, threads, and grammar are not subject to this check, which could lead to unexpected behavior or crashes if they do not contain expected values.

  2. Lack of Input Validation for URLs: The code validates the qdrant_url argument using the is_valid_url function, but it does not validate other URL-related inputs such as query_server_url. This could lead to issues when making requests to these URLs.

  3. Error Handling and Logging: The code returns errors as ServerError, which is then converted into a hyper::Error for HTTP response construction in the handle_request function. However, it would be beneficial if error messages could include more context or details about what exactly went wrong during execution. Additionally, certain important information like request and response details are not consistently logged at different points in the code, which makes debugging difficult.

Summary of changes

  1. Added a new feature flag "search" that enables additional functionality related to searching.
  2. Introduced new CLI arguments for API key, query server URL, and search backend (all under the "search" feature). The api_key defaults to an empty string, while the query_server_url is a required argument.
  3. Added new static variable SEARCH_ARGUMENTS of type SearchArguments under the "search" feature. This variable is populated with the values from the corresponding CLI arguments and used for search operations.

src/utils.rs

Potential issues

  1. Issue: The is_valid_url function does not handle URLs that require authentication or special characters that are not percent-encoded correctly, which may cause the parsing to fail even for valid URLs.

  2. Issue: In the LogLevel implementation of From<LogLevel>, critical errors are mapped to the error level instead of a separate level filter. This may result in important messages being logged at a less critical level than intended.

  3. Issue: The SearchArguments struct's fields api_key and query_server_url are not validated before use, which may lead to runtime errors or security vulnerabilities if an invalid or unauthorized URL is supplied.

Summary of changes

  1. The patch introduces a new struct called SearchArguments under the "search" feature flag, which is used for search-related items that aren't directly supported by SearchConfig. This includes API keys, query server URLs, and search backend specifications.
  2. A function gen_chat_id() has been modified to generate a chat ID using a version 4 UUID instead of the previous method.
  3. The patch also includes updates to the attributes and implementations for a derivative struct at the end of the file, but without providing specific details about the changes.
juntao commented 2 weeks ago

@suryyyansh Please fix the CI failure. Thank you!

alabulei1 commented 1 week ago

Hi @suryyyansh

Could you please write documentation on how to use the search API server? Thanks. https://github.com/LlamaEdge/docs

Is there a GitHub repo called search-api-server? If so, could you please send us your repo link and add README.md about the project? Thanks.

apepkuss commented 21 hours ago

@suryyyansh Could you please fix the conflicts? Thanks a lot.