Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
11.71k stars 1.09k forks source link

Add skill function not work #1079

Open dustreturn opened 3 months ago

dustreturn commented 3 months ago

System Info

pandasai version: 2.0.23 platform: vscode

🐛 Describe the bug

import os import pandasai.pandas as pd from pandasai import Agent from pandasai.skills import skill import streamlit as st

employees_data = { "EmployeeID": [1, 2, 3, 4, 5], "Name": ["John", "Emma", "Liam", "Olivia", "William"], "Department": ["HR", "Sales", "IT", "Marketing", "Finance"], }

salaries_data = { "EmployeeID": [1, 2, 3, 4, 5], "Salary": [5000, 6000, 4500, 7000, 5500], }

employees_df = pd.DataFrame(employees_data) salaries_df = pd.DataFrame(salaries_data)

Function doc string to give more context to the model for use this skill

@skill def plot_salaries(names: list[str], salaries: list[int]): """ Displays the bar chart having name on x-axis and salaries on y-axis using streamlit Args: names (list[str]): Employees' names salaries (list[int]): Salaries """ import matplotlib.pyplot as plt

plt.bar(names, salaries)
plt.xlabel("Employee Name")
plt.ylabel("Salary")
plt.title("Employee Salaries")
plt.xticks(rotation=45)
plt.savefig("temp_chart.png")
fig = plt.gcf()
st.pyplot(fig)

Get your FREE API key signing up at https://pandabi.ai.

You can also configure it in your .env file.

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

agent = Agent([employees_df, salaries_df], memory_size=10) agent.add_skills(plot_salaries)

Chat with the agent

response = agent.chat("Plot the employee salaries against names") print(response)

response: Unfortunately, I was not able to get your answers, because of the following error:

'NoneType' object has no attribute 'type'

dudesparsh commented 3 months ago

@dustreturn can you please confirm if you have used the right API key and passed it properly?

Because from the code you shared, it seems that you're not even making an llm call. One of the reasons for the same I think is that your LLM key is not passed / properly configured.

TebooNok commented 3 months ago

Hi, I noticed this bug, pre-defined functions is not been concat into generated code and lead to error.

You can call the following functions that have been pre-defined for you:

def find_closest_company(df, company_name): """ 模糊查询公司名称,根据company_name在df['公司名称']中返回名称最接近的一行数据: df (df.DataFrame): A table contains a column '公司名称' company_name (str): The name of the company that looking for """ def chinese_currency_to_num(currency_str): """ 由于数据表df里的单位都是元,当用户输入的条件为万元、亿元、万亿等中文单位时,用该函数转换为数值。chinese_currency_to_num("38.16亿元")转换为3816000000.0: currency_str: a chinese string format of a number """ def num_to_chinese_currency(num): """ 输出回答前,用于将float或int数字转换为中文货币单位,返回为字符串。num_to_chinese_currency(3816225765.0)转换为"38.16亿元": num: a float or int number """

Update this initial code:

# TODO: import the required dependencies
import pandas as pd

# Write code here

# Declare result var:
type (possible values "string", "number", "dataframe", "plot"). Examples: { "type": "string", "value": f"The highest salary is {highest_salary}." } or { "type": "number", "value": 125 } or { "type": "dataframe", "value": pd.DataFrame({...}) } or { "type": "plot", "value": "temp_chart.png" }

QUERY

有哪些一万亿以上市值的公司?

Variable dfs: list[pd.DataFrame] is already declared.

At the end, declare "result" variable as a dictionary of type and value.

If you are asked to plot a chart, use "matplotlib" for charts, save as png.

Generate python code and return full updated code: 1456,0-1 59%

Generate python code and return full updated code:

2024-04-02 14:44:00 [INFO] Code generated:

            # TODO: import the required dependencies
import pandas as pd

# Assuming dfs[0] is the DataFrame we are working with
df = dfs[0]

# Filter companies with market value over 1 trillion (一万亿)
# Note: The market value in the dataframe is assumed to be in 元, so 1万亿 = 1e12元
trillion_companies = df[df['市值'] > 1e12]

# Prepare the result
result = {
    "type": "dataframe",
    "value": trillion_companies[['股票代码', '公司名称', '市值']]
}

# Convert the market value to a more readable format using num_to_chinese_currency function
result['value']['市值'] = result['value']['市值'].apply(num_to_chinese_currency)

result

2024-04-02 14:44:00 [INFO] Executing Step 4: CachePopulation 2024-04-02 14:44:00 [INFO] Executing Step 5: CodeCleaning 2024-04-02 14:44:00 [INFO] Code running:

df = dfs[0]
trillion_companies = df[df['市值'] > 1000000000000.0]
result = {'type': 'dataframe', 'value': trillion_companies[['股票代码', '公司名称', '市值']]}
result['value']['市值'] = result['value']['市值'].apply(num_to_chinese_currency)
result

2024-04-02 14:44:00 [INFO] Executing Step 6: CodeExecution 2024-04-02 14:44:00 [ERROR] Failed with error: Traceback (most recent call last): File "/root/anaconda3/envs/chatdb/lib/python3.9/site-packages/pandasai/pipelines/chat/code_execution.py", line 85, in execute result = self.execute_code(input, code_context) File "/root/anaconda3/envs/chatdb/lib/python3.9/site-packages/pandasai/pipelines/chat/code_execution.py", line 170, in execute_code exec(code, environment) File "", line 4, in NameError: name 'num_to_chinese_currency' is not defined