eosphoros-ai / DB-GPT

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
http://docs.dbgpt.cn
MIT License
13.63k stars 1.82k forks source link

[Bug] [datasource] Unable to modify database table structure after executing query #1389

Closed u-zhaoweijun closed 5 months ago

u-zhaoweijun commented 6 months ago

Search before asking

Operating system information

Linux

Python version information

3.10

DB-GPT version

latest release

Related scenes

Installation Information

Device information

Device: GPU, GPU Count: 1, GPU Memory: 24G

Models information

LLM: chatglm3-6b, Embedding model: text2vec-large-chinese

What happened

When I asked the question shown in the figure below, DB-GPT did not execute SQL with the correct column name, resulting in an error. I tried to modify the column name to make the model work properly, but the metadata of the table seems to be locked and cannot execute the modification command. image

What you expected to happen

The correct column names should be used for querying based on the given metadata information, and even if the query is indeed incorrect, the metadata of the table should not be locked.

How to reproduce

Check if locks and transactions are used correctly, and release connections correctly.

Additional context

No response

Are you willing to submit PR?

yyhhyyyyyy commented 6 months ago

@u-zhaoweijun Hello, I believe that during the first run, the database information has already been stored in the vectorized database. You can check the "pilot" directory; there should be cache files underneath. You can try deleting the corresponding data files in the folder, and then rerun DB-GPT.

u-zhaoweijun commented 6 months ago

Thank you very much, it works! But there are two more questions here:

  1. After deleting the metadata files stored in the vectorized database, I reran DB-GPT again and received no comments in the metadata (attached below are logs before and after deletion);
  2. Is it possible to replace the vectorized database with PostgreSQL, as it seems not mentioned in the official documentation.

Before ModelRequest(model='chatglm3-6b', messages=[ModelMessage(role='system', content='\n请根据用户选择的数据库和该库的部分可用表结构定义来回答用户问题.\n数据库名:\n dbgpt_case\n表结构定义:\n [\'ods_crms_dm_cr_target_sum_f_d_orc(day_id (统计日期), org_code (组织编码), org_name (组织名称), month_type (周期名称), contract_amount (合同金额), contract_credit_amount (合同授信金额), credit_business_ratio (授信业务占比), his_overdue_amount (正在发生的逾期金额+逾期后核销金额), credit_amount (授信发生额), sum_not_check_amount (实际占用(累计)), rece_amount (应收账款(累计)), prepay_amount (预付账款(累计)), rece_turnover_days (应收账款周转天数), prepay_turnover_days (预付账款周转天数), credit_avg_turnover_days (授信平均周转天数), overdue_amount_ratio (逾期金额占比), overdue_contract_nums (逾期合同笔数), credit_contract_nums (授信合同笔数), overdue_contract_ratio (逾期合同占比), overdue_days (最大逾期天数之和), contract_nums (合同数量), avg_overdue_days (平均逾期天数), bad_debt_ratio (坏账比率), ending_rece_amount (期末应收账款), ending_prepay_amount (期末预付账款), beginning_rece_amount (期初应收账款), beginning_prepay_amount (期初预付账款), credit_business_grow_ratio (授信业务规模增长比率), rece_grow_ratio (应收账款增长比率), prepay_grow_ratio (预付账款增长比率), credit_quota_amount (授信总额), beginning_credit_ptn_nums (期初授信客户数量), ending_credit_ptn_nums (期末授信客户数量), credit_ptn_grow_ratio (授信客户增长比率), currency (币种), etl_load_time (数据加载时间)), and table comment: 信用目标周期表\', \'ods_crms_dm_cr_trade_info_sum_f_d_orc(day_id (统计时间), bu_code (二级单位编码), bu_name (二级单位名称), ptn_code (客商编码), ptn_name (客商名称), month_type (周期类型), contract_credit_amount (合同授信金额), not_check_amount (累计未核销金额), max_overdue_days (最高逾期天数), max_not_check_amount (最高实际占用金额), period_days (期间天数), avg_actual_amount (平均实际占用), his_overdue_counts (历史逾期次数), his_overdua_amount (历史逾期金额), currency (币种), etl_load_time (数据加载时间)), and table comment: 客商交易周期信息表\', \'ods_crms_dm_cr_trade_info_current_f_d_orc(day_id (统计日期), bu_code (二级单位编码), bu_name (二级单位名称), ptn_code (客商编码), ptn_long_name (客商名称), trade_type (贸易类型拼接), overdue_amount (逾期金额), credit_actual_amount (实际授信占用金额), contract_actual_amount (实际合同占用金额), overdue_amt_ratio (逾期占用占比), currency (币种), etl_load_time (ETL加载时间)), and table comment: 客商交易信息时点\', \'ods_crms_dm_cr_target_current_f_d_orc(day_id (统计日期), org_code (组织编码), org_name (组织名称), sum_not_check_amount (未核销金额), sum_overdue_amount (逾期金额), overdue_amount_ratio (逾期占用比), aging_7_amount (7天以内账龄金额), aging_8_amount (8-15天以内账龄金额), aging_16_amount (16-30天以内账龄金额), aging_31_amount (31-60天以内账龄金额), aging_61_amount (61-90天以内账龄金额), aging_90_amount (90天以上以内账龄金额), aging_7_amount_ratio (7天以内账龄金额占比), aging_8_amount_ratio (8-15天以内账龄金额占比), aging_16_amount_ratio (16-30天以内账龄金额占比), aging_31_amount_ratio (31-60天以内账龄金额占比), aging_61_amount_ratio (61-90天以内账龄金额占比), aging_90_amount_ratio (90天以上账龄金额占比), overdue_days_7_amount (账款逾期7天以内金额), overdue_days_8_amount (账款逾期8天-14天金额), overdue_days_15_amount (账款逾期15天-30天金额), overdue_days_30_amount (账款逾期30天以上金额), overdue_days_7_ratio (账款逾期7天以内比率), overdue_days_8_ratio (账款逾期8天-14天比率), overdue_days_15_ratio (账款逾期15天-30天比率), overdue_days_30_ratio (账款逾期30天以上比率), ptn_nums (授信客户数量), credit_ptn_nums (授信客户数量), credit_ptn_nums_ratio (授信客户占比), currency (币种), etl_load_time (数据加载时间)), and table comment: 信用目标时点表\']\n\n约束:\n 1. 请根据用户问题理解用户意图,使用给出表结构定义创建一个语法正确的 mysql sql,如果不需要sql,则直接回答用户问题。\n 2. 除非用户在问题中指定了他希望获得的具体数据行数,否则始终将查询限制为最多 50 个结果。\n 3. 只能使用表结构信息中提供的表来生成 sql,如果无法根据提供的表结构中生成 sql ,请说:“提供的表结构信息不足以生成 sql 查询。” 禁止随意捏造信息。\n 4. 请注意生成SQL时不要弄错表和列的关系\n 5. 请检查SQL的正确性,并保证正确的情况下优化查询性能\n 6.请从如下给出的展示方式种选择最优的一种用以进行数据渲染,将类型名称放入返回要求格式的name参数值种,如果找不到最合适的则使用\'Table\'作为展示方式,可用数据展示方式如下: response_line_chart:used to display comparative trend analysis data\nresponse_pie_chart:suitable for scenarios such as proportion and distribution statistics\nresponse_table:suitable for display with many display columns or non-numeric columns\nresponse_scatter_plot:Suitable for exploring relationships between variables, detecting outliers, etc.\nresponse_bubble_chart:Suitable for relationships between multiple variables, highlighting outliers or special situations, etc.\nresponse_donut_chart:Suitable for hierarchical structure representation, category proportion display and highlighting key categories, etc.\nresponse_area_chart:Suitable for visualization of time series data, comparison of multiple groups of data, analysis of data change trends, etc.\nresponse_heatmap:Suitable for visual analysis of time series data, large-scale data sets, distribution of classified data, etc.\n用户问题:\n 查询逾期金额最大的组织\n请一步步思考并按照以下JSON格式回复:\n "{\\n \\"thoughts\\": \\"thoughts summary to say to user\\",\\n \\"sql\\": \\"SQL Query to run\\",\\n \\"display_type\\": \\"Data display method\\"\\n}"\n确保返回正确的json并且可以被Python json.loads方法解析.\n\n', round_index=0), ModelMessage(role='human', content='查询逾期金额最大的组织', round_index=0)], temperature=0.5, max_new_tokens=1024, stop=None, stop_token_ids=None, context_len=None, echo=False, span_id='1468cdc0-1cb8-4093-ac6c-d19f0992ffb5:ddb7acb6-a8d3-44fc-badb-dbbb128821e6', context=ModelRequestContext(stream=False, cache_enable=False, user_name=None, sys_code=None, conv_uid=None, span_id='1468cdc0-1cb8-4093-ac6c-d19f0992ffb5:ddb7acb6-a8d3-44fc-badb-dbbb128821e6', chat_mode='chat_with_db_execute', extra={}, request_id=None))

After ModelRequest(model='chatglm3-6b', messages=[ModelMessage(role='system', content='\n请根据用户选择的数据库和该库的部分可用表结构定义来回答用户问题.\n数据库名:\n dbgpt_case\n表结构定义:\n [(\'ods_crms_dm_cr_target_current_f_d_orc(day_id,sum_not_check_amount,ptn_nums,overdue_days_8_ratio,overdue_days_8_amount,overdue_days_7_ratio,overdue_da ... (292 characters truncated) ... aging_8_amount,aging_7_amount_ratio,aging_7_amount,aging_61_amount_ratio,aging_61_amount,aging_31_amount_ratio,aging_31_amount,aging_16_amount_ratio)\',)]\n\n约束:\n 1. 请根据用户问题理解用户意图,使用给出表结构定义创建一个语法正确的 mysql sql,如果不需要sql,则直接回答用户问题。\n 2. 除非用户在问题中指定了他希望获得的具体数据行数,否则始终将查询限制为最多 50 个结果。\n 3. 只能使用表结构信息中提供的表来生成 sql,如果无法根据提供的表结构中生成 sql ,请说:“提供的表结构信息不足以生成 sql 查询。” 禁止随意捏造信息。\n 4. 请注意生成SQL时不要弄错表和列的关系\n 5. 请检查SQL的正确性,并保证正确的情况下优化查询性能\n 6.请从如下给出的展示方式种选择最优的一种用以进行数据渲染,将类型名称放入返回要求格式的name参数值种,如果找不到最合适的则使用\'Table\'作为展示方式,可用数据展示方式如下: response_line_chart:used to display comparative trend analysis data\nresponse_pie_chart:suitable for scenarios such as proportion and distribution statistics\nresponse_table:suitable for display with many display columns or non-numeric columns\nresponse_scatter_plot:Suitable for exploring relationships between variables, detecting outliers, etc.\nresponse_bubble_chart:Suitable for relationships between multiple variables, highlighting outliers or special situations, etc.\nresponse_donut_chart:Suitable for hierarchical structure representation, category proportion display and highlighting key categories, etc.\nresponse_area_chart:Suitable for visualization of time series data, comparison of multiple groups of data, analysis of data change trends, etc.\nresponse_heatmap:Suitable for visual analysis of time series data, large-scale data sets, distribution of classified data, etc.\n用户问题:\n 查询逾期金额最大的组织\n请一步步思考并按照以下JSON格式回复:\n "{\\n \\"thoughts\\": \\"thoughts summary to say to user\\",\\n \\"sql\\": \\"SQL Query to run\\",\\n \\"display_type\\": \\"Data display method\\"\\n}"\n确保返回正确的json并且可以被Python json.loads方法解析.\n\n', round_index=0), ModelMessage(role='human', content='查询逾期金额最大的组织', round_index=1), ModelMessage(role='human', content='查询逾期金额最大的组织', round_index=0)], temperature=0.5, max_new_tokens=1024, stop=None, stop_token_ids=None, context_len=None, echo=False, span_id='e9894a22-a8fc-4347-9dac-19644c436304:4157654d-7831-4dde-82b9-7e4b9bbda468', context=ModelRequestContext(stream=False, cache_enable=False, user_name=None, sys_code=None, conv_uid=None, span_id='e9894a22-a8fc-4347-9dac-19644c436304:4157654d-7831-4dde-82b9-7e4b9bbda468', chat_mode='chat_with_db_execute', extra={}, request_id=None))

Table Structure

image

`

yyhhyyyyyy commented 6 months ago

@u-zhaoweijun You can first try rerunning DB-GPT and then check the logs again to see if there are any logs corresponding to the fields. If not, please delete the related data files under the “pilot” folder again, and then execute DB-GPT again to check.

u-zhaoweijun commented 6 months ago

OK, I remove the files and restart DB-GPT, it works. Thank you! By the way, should we provide more convenient ways in the future?

u-zhaoweijun commented 6 months ago

And how about the issue itself “Unable to modify database table structure after executing query ”? I use DB-GPT to chat data with MySQL database,and than I try to change the table structure,but MySQL didn't work and was ‘waiting for table metadata lock’.

github-actions[bot] commented 5 months ago

This issue has been marked as stale, because it has been over 30 days without any activity.

github-actions[bot] commented 5 months ago

This issue bas been closed, because it has been marked as stale and there has been no activity for over 7 days.

springga commented 5 months ago

I have the same issue.

One way to work around is to stop dbgpt before altering table. Delete the database connection in dbgpt and add it again. Then metadata will be updated.

chuangzhidan commented 2 months ago

@u-zhaoweijun Hello, I believe that during the first run, the database information has already been stored in the vectorized database. You can check the "pilot" directory; there should be cache files underneath. You can try deleting the corresponding data files in the folder, and then rerun DB-GPT.

can i kindly ask what particular directory? which setting file can specify this path?