eosphoros-ai / DB-GPT

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
http://docs.dbgpt.cn
MIT License
13.24k stars 1.75k forks source link

[RAG] Is the RAG in the code used to match and retrieve the query and the entire schema information? #1828

Open chuangzhidan opened 1 month ago

chuangzhidan commented 1 month ago

Search before asking

Description

因为检索效果不好,如果我想改成用比如表名,列名进行检索,把每张表的列进行单独拆分(不知道哪个文件开始给它做embedding,我想只针对特定信息做embedding),然后根据检索到的信息和id去提取其余的表结构信息,

Documentation Links

No response

Are you willing to submit PR?

dusx1981 commented 1 month ago

ChromaStore 是做相关性检索的默认存储,在工程启动的时候,系统会读取目标数据库的元数据信息,主要是表的列信息,并把这些信息存入 dbname+_profile 的存储文件。如果想存储自己的信息,需要自己添加逻辑

chuangzhidan commented 1 month ago

读取目标数据库的元数据信息,主要是表的列信息,并把这些信息存入

谢谢,你知道是在哪个脚本中读取和存储的吗?没找到 ,谢谢

Aries-ckt commented 4 weeks ago

@chuangzhidan DBSummary

chuangzhidan commented 4 weeks ago

@chuangzhidan DBSummary

仅仅只有这个而已,看不出什么 class DBSummary: """Database summary class."""

def __init__(self, name: str):
    """Create a new DBSummary."""
    self.name = name
    self.summary: Optional[str] = None
    self.tables: Iterable[str] = []
    self.metadata: Optional[str] = None

def get_summary(self) -> Optional[str]:
    """Get the summary."""
    return self.summary
Aries-ckt commented 4 weeks ago

@chuangzhidan DBSummaryClient and RdbmsSummary

def _parse_db_summary(
    conn: BaseConnector, summary_template: str = "{table_name}({columns})"
) -> List[str]:
    """Get db summary for database.

    Args:
        conn (BaseConnector): database connection
        summary_template (str): summary template
    """
    tables = conn.get_table_names()
    table_info_summaries = [
        _parse_table_summary(conn, summary_template, table_name)
        for table_name in tables
    ]
    return table_info_summaries
chuangzhidan commented 4 weeks ago

@chuangzhidan DBSummaryClient and RdbmsSummary

def _parse_db_summary(
    conn: BaseConnector, summary_template: str = "{table_name}({columns})"
) -> List[str]:
    """Get db summary for database.

    Args:
        conn (BaseConnector): database connection
        summary_template (str): summary template
    """
    tables = conn.get_table_names()
    table_info_summaries = [
        _parse_table_summary(conn, summary_template, table_name)
        for table_name in tables
    ]
    return table_info_summaries

没有看到一开始是怎么向量化的,这个只是提取信息吧

Aries-ckt commented 3 weeks ago

DBSchemaAssembler, persist()