Open Undertone0809 opened 1 month ago
Bot detected the issue body's language is not English, translate it automatically.
🚀 Feature Request
Use streamlit to build a web-based AI search engine with the following capabilities:
Implement vertical AI search engine SOP👇
For sources that do not have a standard API, it is necessary to index the data of the source site. Incremental construction uses the search box of the source, and stock construction uses search engine webpage snapshots. It is difficult to obtain the full data of a certain source.
The system preset weight + the user clicks to update the source weight. When retrieving multiple information sources, the number of results and initial sorting are returned based on the source weight.
Need an efficient/fast reranking framework such as FlashRank
Split the retrieved content into chunks, store it in the vector database, and mount the context to request LLM responses. The similarity matches part of the content to avoid violent transmission.
Regularly analyze historical queries, extract hot search keywords, and build a keyword database. If the query hits the keyword library, the retrieve link will be cached.
🚀 Feature Request
使用 streamlit 构建一个基于 web 的 AI 搜索引擎,拥有以下能力:
References 1
实现垂类 AI 搜索引擎 SOP👇
确定三个核心问题:
搜索前query rewrite:
RAG 流程
主要工程量
对于没有标准API的source,需要对source站点的数据构建索引。增量构建使用source的搜索框,存量构建使用搜索引擎网页快照,很难拿到某个 source 的全量数据
系统预置权重 + 用户点击更新 source 权重,多信息源检索时依据 source 权重返回结果数量和初始排序
需要一个高效/快速的 reranking 框架,比如 FlashRank
对检索到的内容进行 chunk 拆分,存储向量数据库,挂载上下文请求 LLM 回答时,相似度匹配部分内容,避免暴力传输
定期分析历史 query,提取热搜关键词,构建关键词库。命中关键词库的 query,retrieve 环节走缓存